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A systematic framework for jet definition is developed from first principles of physical 
measurement, quantum field theory, and QCD. 
A jet definition is found which: 

• is theoretically optimal in regard of both minimization of detector errors and inversion of hadroniza- 
tion; 

• is similar to a cone algorithm with dynamically negotiated jet shapes and positions found via shape 
observables that generalize the thrust to any number of axes; 

• involves no ad hoc conventions; 

• allows a fast computer implementation [hep-ph/991 241 5]. 

The framework offers an array of options for systematic construction of quasi-optimal observables for 
specific applications. 



The second edition: 

clarifies, expands and solidifies the arguments behind the formal derivation; 

° strengthens consistency of the derivation at the last step -> fine-tunes the final criterion 

-> the jet search algorithm is now much simpler, faster and robust [7]; 
° uncovers new options not available in conventional schemes. 

Eliminated: 

° The algorithmically cumbersome linear restriction on missing energy; now treated additively with a 
cumulative upper bound (6.21). 

Added: 

• a general theory of optimal observables (2.7-2.35); 

• an analysis of inversion of hadronization (5.10); 

• the relation to cone algorithms and thrust (8.1 1 , 8.14); 

• a model-independent tool to quantify hadronization (the Y-E soft distribution; 8.1 9); 

• the option of multiple jet configurations (9). 



Related materials (code etc.) are available at http://www.inr.ac.ru/~ftkachov/projects/jets/index.htm 
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Introduction 1 

Jet finding algorithms are a key tool in high-energy physics 
[1], and the problem of quantitative description of the structure 
of multi-hadron final states remains at the focus of physicists' 
attention (cf. e.g. [2]). 

This paper continues the systematic investigation of quan- 
titative description of multijet structure from first principles of 
physical measurements and quantum field theory undertaken in 
[3]-[5]. Our purpose here is to complete the analysis of [4] 
in regard of jet algorithms. a We are going to develop a system- 
atic theory of jet definition and derive a jet finding criterion — 
the so-called optimal jet definition (OJD) — summarized in 
Sec.7.16. 

It is optimal in a well-defined sense — the sense which is 
ignored in the conventional deliberations about jet algorithms. 
Namely, the new principle on which the presented theory of jet 
definition is based is that the configuration of jets must inherit 
maximum physical information from the original event 
(Sec. 5.6). 

Now the first difficulty (besides realizing its importance) is 
to give that axiom a systematic quantitative form. This is what 
the first part of the present work (Sections 2-4) deals with. 

In the second part (Sections 5-7) we derive OJD which is 
summarized in Sec.7.16. 

The third part (Sections 8-11) investigates the definition. 

A more detailed description of the content is given in 
Sec. 1.5. 

The focus in this paper is on the analytical theory of the 
criterion and the underlying principles. Its software implemen- 
tation is discussed in a separate publication [7]. A detailed 
numerical investigation of OJD requires a separate project. 

Also beyond the scope of the present work are complete 
formal proofs of the background propositions of Section 3 (this 
especially concerns the arguments in Sec. 4.1): the purpose 
here is to uncover and clearly formulate the assumptions in- 
volved and to devise a formulaic way to talk about jet finding 
with hand-waving minimized. What may appear as fancy 
mathematical formulations is primarily intended as an invita- 
tion to mathematical physicists to fill in the remaining gaps. 

Notations agree with [4] but the present paper is self- 
contained in this respect. 

The theoretical attitude which permeates this work is that 
jets are not partons but a data processing tool motivated by the 
partonic structure of QCD dynamics at high energies [4]. Such 
a shift of emphasis allows one to remove artificial restrictions 
in the design of data processing algorithms. 

In view of importance of the subject and the prevailing 
prejudice that the definition of jets is a matter of subjective 
preference (which, as the theory developed in the present paper 
shows, may not be quite true), I prefer to tax the reader's pa- 
tience by explicitly stating trivial things (and even putting 
them in boxes) rather than leave an important axiom out of the 
picture. 



a The 2 — > 1 recombination version of the optimal jet definition was dis- 
cussed in [6]; see Sec. 10.1 1 of the present paper. It is interesting to ob- 
serve how the popularity of recombination schemes (which of course is due 
to their simplicity) led astray the study of jet algorithms within the frame- 
work of [4] which per se provides no motivation for considering 2 — > 1 re- 
combinations. This is not the first time that I realize, ex post facto exami- 
nation, that the hardest part in the solving of seemingly intractable prob- 
lems is invariably to escape the psychological traps created by quasi- 
solutions. 



Numbering and formatting 1.1 

The two-level numbering conventions of [4] are adopted to 
facilitate searches of cross-referenced items: sub(sub)sections, 
equations, figures, tables, and textual propositions are num- 
bered consecutively within sections. Section headings are set in 
bold type. 

Sub- and subsubsection headings differ only by formatting 
(solid and dotted underlining, respectively). 

Underlined italic type indicates an important term being de- 
fined. The underlining helps the eye to find definitions in the 
body of the text. The meaning of such terms in the context of 
our theory is usually narrowed compared with the conventional 
usage. 



Double boxes enclose conceptually important propositions, 
which maybe numbered. 

1.2 



Simple solid boxes contain formulas and propositions which are 
part of the optimal jet definition (OJD) and related algorithmic op- 
tions. 

1.3 



Dotted boxes denote important formulas and propositions. 

1.4 



• Bullets indicate further options, important asides, etc. 

The reader is invited to begin to read this text by browsing 
through it using boxes and bullets as visual clues. 

Plan 1 .5 

The paper can be roughly divided into three parts. 

The first part (Sections 2^1) is preparatory and devoted to a 
clarification of some general issues pertaining to data process- 
ing procedures of which jet algorithms are a part. 

The second part (Sections 5-7) is devoted to the derivation 
of the optimal jet definition. 

The third part (Sections 8-1 1) investigates the OJD. 

Section 2 is devoted to a clarification of the relevant issues 
of mathematical statistics (this perhaps should have been done 
already in [4]). The reasoning is general and practically no spe- 
cifics of high-energy physics is invoked. We introduce the no- 
tion of (quasi-) optimal observables for measurements of fun- 
damental parameters such as as , Mw, etc. (Sec. 2.7). Such ob- 
servables allow one in principle to reach the best possible pre- 
cision for fundamental parameters. The resulting practical pre- 
scriptions (Sec. 2.25) improve upon the usual signal-vs. -noise 
considerations with an important new ingredient here being the 
notion of regularization of discontinuities (Sec. 2.52). The pre- 
scriptions of Section 2 motivate new ways for the use of jet al- 
gorithms some of which are described in Section 9. 

Section 3 is essentially a clarification of the arguments of 
[4] in the light of the results of Section 1. It discusses the 
"kinematical" properties of observables (their so-called C- 
continuity [4]) which ensure their optimal sensitivity to errors 
and amenability to theoretical studies. C-continuity is de- 
scribed using a special distance among events (Sec. 3.23). The 
arguments here culminate in a quantitative description of the 
event's physical information content (Sec. 3.42) which serves 
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as a formal starting point for a subsequent derivation of kine- 
matical jet definition. 

Section 4 investigates the specific structure of QCD prob- 
ability densities. The purpose is to clarify the logical connec- 
tion between the notions of C-continuity and IR safety (the 
former turns out to be a non-perturbative reformulation of the 
latter). This solidifies the conjecture of [8] concerning the pos- 
sibility to confront perturbative calculations with hadronic data 
for IR safe observables (cf. Eq.4.2). Then a formal description 
of hadronization is introduced (Eq.4.12) to prepare ground for 
a subsequent study of dynamical aspects of our jet definition. 
A formal construction of optimal observables which takes into 
account the hadronization model (Eq.4.20) provides a refer- 
ence point for the constructions of observables based on jet al- 
gorithms. The conventional scheme for that is discussed in 
Sec. 4.28. 

Section 5 discusses jet definitions. First in Sec. 5.1 the im- 
plicit conventional definition of an ideal jet algorithm is inves- 
tigated. Then Sec. 5.6 introduces a definition of jets rooted in 
the formalism of the preceding sections. We then exhibit its 
connection with the concept of inversion of hadronization 
(Sec. 5. 10). Then a quantitative version jet definition is de- 
scribed (Sec. 5. 17). It is based on inequalities of a factorized 
form (Eq.5.18) which estimate the loss of physical information 
content in the transition from events to jet configurations. We 
discuss how different jet algorithms can be compared on the 
basis of how well they conserve the information (Sec. 5.26). 
Then a universal dynamics-agnostic variant of jet definition is 
introduced (Sec. 5.27), and in Sec. 5.31 we explain how it can 
be modified to include dynamical information. 

Section 6 is technical and devoted to the derivation of the 
factorized estimate. The main trick is the so-called recombina- 
tion matrix (Sec. 6.1); finding the configuration of jets is 
equivalent to finding that matrix. The matrix can be regarded 
as a cumulative variant of the entire sequence of 2 — » 1 recom- 
binations in the conventional recombination jet finding scheme 
(cf. also Sec. 10.1 1) but now all particles are, so to say, recom- 
bined into jets democratically. (In this respect, OJD is equiva- 
lent to a prescription for determining the order of recombina- 
tions.) 

In Section 7, the remaining ambiguities are fixed in such a 
way as to ensure a maximal computational convenience, mo- 
mentum conservation, and Lorentz covariance. We consider 
both the spherical kinematics (c.m.s. annihilation of e + e~ pairs 
into hadrons) and hadron collisions kinematics (a boost- 
invariant formulation). 

Section 8 clarifies the mechanism of the obtained jet defi- 
nition and establishes its connection with shape observables of 
the conventional type (Sec. 8.1 1). Then we present simple 
analytical arguments which show that OJD is essentially a cone 
algorithm with dynamically determined positions and shapes of 
jet cones (Sec. 8.14). 

An important tool we obtain as a subproduct is the so-called 
Y-Esoft distribution (Sec. 8.19). It allows one to quantify the 
mechanism of hadronization in a model-independent fashion. 

Section 9 considers the issues for a discussion of which the 
conventional schemes offer no framework whatever, namely, 
the problem of non-uniqueness of jet configurations which in 
the case of OJD takes the problem of multiple minima. The 
options naturally offered by the developed theory allow one to 
go beyond the restrictions of the conventional data processing 
scheme based on jet algorithms (4.38). 



Section 10 compares OJD with the conventional cone and 
recombination schemes. We discuss the vicious circle in the 
conventional jet definitions (no principle to fix the initial cone 
configuration/order of recombinations; 10.2). Also derived is a 
curious variant of OJD (Eq. 10.7) which corresponds to the 
original cone algorithm of [8] rewritten in terms of IR safe 
shape observables. In Sec. 10.11 we discuss the connection of 
OJD with the conventional 2 — » 1 recombination criteria. 

A general conclusion is that the mechanism of OJD is rather 
similar to the conventional cone algorithms. 

Section 1 1 summarizes our findings. 

Optimal observables, continuity 

and regularizations 2 

For a meaningful discussion of jet algorithms it is essential 
to regard them as a special case of general data processing pro- 
cedures. With that in mind, below are listed some basic facts of 
mathematical statistics which emerged as necessary for a sys- 
tematic clarification of the issue of jet definition. Although the 
high-energy physics background affected the terminology and 
emphasis of the presentation below, it deals essentially with 
elementary notions of parameter estimation. However, the im- 
portant prescription we arrive at in Sec. 2.25 seems to be 
missing from textbooks. 

Some generalities 2.1 

One deals with a random variable P whose instances 
(specific values) are called events. Throughout most of this 
section, the nature of events P can be anything: they can be 
random points on the real axis or random measures on the unit 
sphere. 

One always deals with a finite collection of experimentally 
observed events {P, }, . In the context of applications of interest 
to us, events are obtained via rather complex measurement 
procedures, so that their probability distribution 7l(P) reflects 
experimental imperfections. 

Experimental imperfections are of two kinds to be called, 
respectively, statistical errors which are due to the finite num- 
ber of events in the event sample, and detector errors i.e. dis- 
tortions of individual events by measurement devices. Of 
course, the two cannot be strictly separated because detector 
errors may cause some events not to be seen at all but this is 
not important for our purposes. 

Theory provides a model for 7i(P) controlled by a small 
number of fundamental parameters such as the Standard 
Model's as, M w , etc. 

Theoretical knowledge may also involve imperfections, e.g. 
the necessity to describe hadronic data in terms of quarks and 
gluons in perturbative quantum chromodynamics (pQCD). 

Any data processing has, in the final respect, two purposes. 
One is to test the hypothesis of correctness of the underlying 
theoretical model, which we will not discuss. The other pur- 
pose is to extract the values of as, Mw,... from given {P, }, 
and 7T(P). 

This can be represented as follows: 



p> J J data processing algorithm 



F.V.Tkachov hep-ph/9901444 [2 nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 4 of 45 



It is convenient to call the collection of events {P, }, 
raw physical information . On the other hand, to obtain the pa- 
rameters on the r.h.s., one has to interpret data in terms of a 
specific model, so such parameters are conveniently called 
interpreted physical information . 

The scheme 2.2 represents the much studied basic problem 
of mathematical statistics [9], [10]. However, we would like to 
regard it in the light of specifics of the formalism of quantum 
field theory where a central role is played by quantum opera- 
tors whose average values over ensembles of events are the 
quantum observables. In the language of mathematical statis- 
tics, this means that we are going to place emphasis on the 
generalized method of moments. 

So we wish to consider the general scheme in which the 
transformation 2.2 is accomplished by choosing suitable func- 
tions/CP) defined on events, and then finding the parameters 
by equating their theoretical average values, 

<Ah = </> = JdP»(P)/(P), 2.3 

where n is supposed to be known so that this can be computed 
for any values of fundamental parameters, with the correspond- 
ing experimental values: 



The scheme 2.2 becomes: 



7l(P)- 



observable / 
observable / ) if )exp 



fit 



+ a s ,M w ,... 



2.4 



2.5 



In terms of mathematical statistics, the weight / is a gener- 
alized moment. In the context of quantum field theory, to such 
functions there correspond quantum operators in terms of 
which the entire theory is formulated. We will be using the 
quantum-theoretic term observable for such functions, and call 
(/) its observable value . 

The values of all possible observables (/) will be called 
processed physical information , which is a model-independent 
concept to be contrasted with the model-dependent notion of 
interpreted physical information (fundamental parameters). 

With processed physical information, one simply deals with 
all possible functions on events. Their general properties such 
as continuity play an important role in the analysis of sensitiv- 
ity of observables to experimental and theoretical imperfec- 
tions. Such properties can be called kinematical because they 
depend only on the general structure of detector errors and of 
the underlying formalism (quantum field theory), and can be 
studied in a model-independent manner (Section 3). 

All conventional data processing procedures (involving 
event selection, jet algorithms, histograms, etc.) are special 
cases of the scheme 2.5. In practice the fits of theoretical pre- 
dictions to experimental data often involve many observables 
(e.g. each bin of a histogram represents one numeric-valued 
observable). Such collections can be regarded simply as ob- 
servables that take non-numeric values (in the simplest inter- 
pretation, arrays, perhaps multidimensional; in a more sophis- 
ticated interpretation, the values may be functional objects). 

For explicitness' sake, here is an obvious but key axiom: 



The best observables /(P) are those which yield the best preci- 
sion for fundamental parameters. 

2.6 



It turns out that there is a general prescription to construct 
such observables in a systematic fashion. 



Optimal observables 



2.7 



Suppose one needs to extract the value of the fundamental 
parameter M on which depends the probability distribution 
7l(P). h We are going to study ways to choose an observable/ 
so as to determine M to maximum precision. First we will ob- 
tain an ideal explicit formula for such an optimal observable 
(Eq.2.17). The formula itself is essentially a translation of the 
method of maximum likelihood into the language of moments 
but our derivation is somewhat unconventional and it allows us 
to go further and study effects of small deviations from opti- 
mality (Eq. 2.22; it seems to be a new result), and then arrive at 
a prescription for a systematic practical construction of quasi- 
optimal observables (Sec. 2.25). The prescription seems to be 
both important and new\ 

In the context of precision measurements one can assume 
the magnitude of errors to be small. Under this assumption, 
one can relate variations in the values of M with variations in 
the values of (/) as follows: 



2.8 



where the derivative is applied only to the probability distribu- 
tion (M is unknown, so even though the solution, / opl , will de- 
pend on M, any such dependence is coincidental and therefore 
"frozen" in this calculation): 



dM J <*M 



dM 



2.9 



The axiom 2.6 translates into the requirement of minimizing 
the expression 2.8 by an appropriate choice of /. 

Then 8(f) = A'-' /2 > /Var(/} , where: 



Var (/) = JdP^(P) (/(P) - (f)f = </2> - (ff 

In terms of variances, Eq.2.8 becomes: 

VarM= {^fj Var/. 

We want to minimize this by a suitable choice of/. 

A necessary condition for a minimum can be written in 
terms of functional derivatives: d 
8 

Sf(P) 



2.10 



2.11 



-VarM = 0. 



2.12 



Substitute Eq. 2.11 into 2.12 and use the following relations: 



b We assume that all mechanisms of distortion of observed events are in- 
cluded into the probability distribution 7T(P). The problem of coping with 
insufficient knowledge of detector errors that distort individual events is 
discussed in Sec. 2.48. 

c New to the extent that I've seen no trace in the literature of its being 
known to either theorists or experimentalists. 

d An interesting mathematical exercise of casting the following reasoning 
(the functional derivatives, 5-functionals, etc.) into a rigorous form is left 
to interested mathematical parties. For practical purposes it is sufficient to 
note that the range of validity of the prescriptions we obtain is practically 
the same as for the maximum likelihood method; see Sec. 2.32. 



F.V.Tkachov hep-ph/9901444 [2 nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 5 of 45 



Sf(P) Sf(P) 

8 d{f) _ dnp) 
8f(P) dM dM 

After some simple algebra one obtains: 
ainjr(P) 



/(P) = (/) + const - 



dM 



2.13 



2.14 



where the constant is independent of P. The constant plays no 
role since / is defined by this reasoning only up to a constant 
factor. Noticing that 



JdPn-(P) 



aimr(P) _ a 



\6Ptc(P) = 1 = 0, 



dM dM J dM 
we arrive at the following general family of solutions: 

' 1 dM 2 



2.15 



2.16 



where C; are independent of P but may depend on M. 

For convenience of formal investigation we will usually deal 
with the following member of the family 2.16: 



/opt(P) = 



3ln?r(P) 

dM 



2.17 



Then Eq.2.15 is essentially the same as 
(/o P ,) = 0. 



• As a practical prescription, one may drop multiplicative and 
additive P-independent constants from/ opt (P) without violat- 
ing optimality of the observable. However, Eq.2.18 may then 
be violated, and the relations such as 2.22 would then have to 
be modified accordingly. 

2.19 



The solution 2.17 is a local quadratic minimum 

Consider 2.11 as a functional of /, VarM \f] . Assume (p is a 
function of events such that (<p 2 ) < °o . We are going to evalu- 
ate the functional Taylor expansion of VarM \f opt +(p] with re- 
spect to <p through quadratic terms: 

VarM[/ opt +<p] = VarM[/ opt ] 
8 2 VarM[/] 



8f(P)8f(Q) 



f=f 



<p(P)<p(Q)dPdQ + .. 



2.20 



opt 



It is sufficient to use functional derivatives and relations such 
as 2.13 and 



/(Q) = 5(P,Q), |<5(P,Q)<p(P)dP = <p(Q). 2.21 



Sf(P) 



We obtain the following result which appears to be new: 
VarM[/ opt+( p] 

= y^y + 7^r{(/opt)x<f 2 > - (/ opt x<p) 2 } + ... 



2.22 



where (p = (p-(q>) . 

Non-negativity of the factor in curly braces follows from the 
standard Schwartz inequality. □ 



• The first term on the r.h.s. of 2.22, (/ 2 pt ) 1 , is the absolute 

minimum for the variance of M as established by the funda- 
mental Rao-Cramer inequality [9], [10]. The latter is valid for 
all (p and therefore is somewhat stronger than the result 2.22 
which we have obtained only for sufficiently small (p. How- 
ever, Eq. 2.22 gives a simple explicit estimate for the deviation 
from optimality and so makes possible the practical prescrip- 
tions of Sec. 2.25. 
The quantity 



opt 



opt/ 



2.23 



is closely related to Fischer's information [9], [10]. 

More generally, it will be convenient to talk about infor- 
mativeness If of an observable / with respect to the parameter 
M, defined by 



/ / =(VarM[/])" 



2.24 



The smaller the error of the value of M extracted using/, the 
larger the informativeness of /. 

Then / opt is simply the informativeness of / opt . 

Note that Fisher's information is an attribute of data 
whereas the informativeness is a property of an observable. 

It is also possible to talk about an optimal observable from a 
restricted class of observables. An example of such restriction 
is considered in Sec. 4.52. 



2 - 18 Quasi-optimal observables 



2.25 



The fact that the solution 2.17 is the point of a quadratic 
minimum means that any observable / quas i which is close to 
2.17 would be practically as good as the optimal solution (we 
will call such observables quasi-optimal ). A quantitative 
measure of closeness is given by comparing the 0(1) and 
0((p 2 ) terms on the r.h.s. of 2.22: 



(fi t )(<p 2 )-(f opt <P) 2 

(/opt) 



« 1, 



2.26 



where (p = / quasi - (/ quasi ) - / opt . 

The subtracted term in the numerator can be dropped, 
which only overestimates the l.h.s. and is safe. Assuming for 
simplicity of formulas that (/ quasi ) = 0, the criterion 2.26 takes 

the following simple form: 



( [/quasi /opt ] ) <<; (/opt } • 



2.27 



Here is the representation in terms of integrals: 

JdP^(P)[/ quasi (P)-/ opt (P)] 2 « JdP^(P)/ 2 pt (P) . 2.28 

The criterion 2.27 may be more useful in the practical con- 
struction of /quasi, and since the latter would tend to oscillate 
around / pt causing (/ opt 7p) to be suppressed, the difference 

between 2.27 and 2.26 may be negligible. 

As a rule of thumb, one would aim to minimize the brack- 
eted expression on the l.h.s. of 2.28 for each (or "most") P: 



[/quasi(P)-/opt(P)f«/o P t(P)- 



2.29 
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One can talk about non-optimality of observables (i.e. their 
lower informativeness compared with the optimal observable) 
and also about sources of non-optimality . These have a simple 
interpretation in the case of quasi-optimal observables as the 
deviations of / quas i(P) from/ opt (P) which give sizeable contri- 
butions to the integral in 2.27. The simplest example is when 
/opt is a continuous smoothly varying function whereas / quaS i is a 
piecewise constant approximation. Then/ quas i would usually 
deviate most from/ opt near the discontinuities which, therefore, 
are naturally identified as sources of non-optimality. 

It is practically sufficient to take Eq.2.17 at some value 
M=Mo close to the true one (which is unknown anyway). This 
is usually possible in the case of precision measurements. One 
could also perform an iterative procedure for M starting from 
Mo, then replacing Mo with the value newly found, etc. — a 
procedure closely related to the optimization in the maximum 
likelihood method. 

So the method of quasi- optimal observables is as follows: 



Furthermore, it is possible to use an approximate shape for 
the r.h.s. of 2.17 such as given by a few terms of a perturbative 
expansion. In terms of quantum-field-theoretic perturbation 
theory this means that it may be sufficient to construct /quasi on 
the basis of the expressions for probability distribution (matrix 
elements squared) obtained in the lowest PT order in which the 
dependence on the parameter manifests itself: theoretical up- 
dates of radiative corrections need not be reflected in the quasi- 
optimal observables. It may also be convenient to use a piece- 
wise linear /q uas i or even piecewise constant. The latter option 
actually corresponds to conventional procedures based on cuts 
(cf. Sec. 2.35); however, using piecewise linear approximations 
for /quasi should yield noticeably better without incurring no- 
ticeable algorithmic complications. 

If the dimensionality of the space of events is not large then 
it may be possible to construct a suitable /quasi in a brute force 
fashion, i.e. build a multi-dimensional interpolation formula 
for tt(P) (via an adaptive routine similar to those used e.g. in 
[11]) for two or more values of M near the value of interest, 
and perform the differentiation in M numerically. 

Also, one can use different expressions for/ qua si: e.g. per- 
form a few first iterations with a simple shape for faster calcu- 
lations and then switch to a more sophisticated interpolation 
formula for best precision. 

Several parameters 2.31 

With several parameters to be extracted from data there are 
the usual ambiguities due to reparametrizations but one can 
always define an observable per parameter according to 2.17. 
Then the informativeness 2.24 is a matrix (as is Fischer's in- 
formation). 

Since the covariance matrix of (quasi-) optimal observables 
is known (or can be computed from data), the mapping of the 
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corresponding error ellipsoids for different confidence levels 
from the space of values of the observables into the space of 
parameters is straightforward. 

Connection with maximum likelihood 2.32 

The prescription 2.17 is closely related to the standard 
method of maximum likelihood that prescribes to estimate M 
by the value which maximizes the likelihood function: 

E, ln7r ( p <) ■ 2.33 

where summation runs over all events from the sample. The 
necessary condition for the maximum of 2.33 is 

This agrees with 2.17 thanks to 2.18. 

So the formula 2.17 can be regarded as a translation of the 
method of maximum likelihood (which is known to yield the 
theoretically best estimate for M [9], [10]) into the language of 
the generalized method of moments. 

Equivalents of the formula 2.17 can be found at intermedi- 
ate stages of examples of derivations of estimators for pa- 
rameters of standard (e.g. normal) probability distributions ac- 
cording to the maximum likelihood method." 1 

The method of quasi-optimal observables is expected to 
yield results on a par with the maximum likelihood method 
(because of their close relation; see Sec. 2.25) but it has the 
following advantages: 

(i) applicability to situations with millions of events; 

(ii) a greater flexibility in the case of complicated 7l(P). 

In such situations a direct minimization of the likelihood func- 
tion 2.33 is unfeasible. 

Connection of Eq. 2.17 with event selection 2.35 

As a simple consistency check, note that Eq.2.17 agrees 
with the simplest procedures of event selection used to isolate 
the signal and suppress backgrounds. 

For instance, suppose that most sensitivity of tt(P) to M 
(i.e. the derivative d M n is largest) is localized in some region 
n of events (e.g. due to a superselection rule or if M is the 
mass of a particle that predominantly decays into a certain 
number of jets). Then/ opt (P) vanishes outside IT: 

/ opt (P) = if Pen. 2.36 

A popular procedure in such a situation is to introduce a selec- 
tion criterion (a cut): 

P satisfies the selection criterion <^=> Pen, 2.37 

and to compute the fraction of events from that region, i.e. the 
observable defined by 

/crude C 3 ) = ( P satisfies the selection criterion ), 2.38 
where the ©-function is defined according to 



0(TRUE) = 1; 0(FALSE) = O. 2.39 



c Rather surprisingly, none of a dozen or so textbooks and monographs on 
mathematical statistics that I checked (including a comprehensive practical 
guide [9] and a comprehensive mathematical treatment [10]) explicitly 
formulated the prescription in terms of the method of moments. 



(1 ) construct an observable /quasi using 2.1 7 as a guide so that 
/quasi were close to / opt in the integral sense of Eq. 2.26; 

(2) find M by fitting (f^)^ against (/ quasi ) exp ; 

(3) estimate the error for M via 2.1 1 ; 

(4) /quasi may depend on M to find which one can optionally use 
an iterative procedure starting from some value M close to 
the true one. 

2.30 
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In other words, with the observable 2.38 one simply ignores all 
non-trivial dependence of / opt on P inside n . 

Furthermore, if in some region II ' the magnitude of tt(P) is 
large and not offset by its sensitivity to M (the situation of a 
"large background") then one introduces another selection cri- 
terion similar to 2.38, and so on. The net effect is that the ob- 
servable takes the form 

/crude (P) = O, P satisfies i - th selection criterion) x . . . 2.40 

In general, /crude may also contain a factor other than a 6- 
function (shown with dots above). For instance, in the case of a 
histogram for some differential distribution of events, each bin 
corresponds to an observable of the form 2.40 (the last selec- 
tion criterion is whether or not a value computed for the event 
belongs to the bin). Then the non-trivial factor (shown with 
dots) may take e.g. integer values such as the number of dijets 
from P that fall into the bin corresponding to an interval of in- 
variant masses (with each bin representing one observable). 

• An immediate practical prescription from the above con- 
cerns intermediate regions where either d M n is not small 
enough or 7l(P) is not large enough. Then one should make/ 
interpolate between and 1 over such intermediate regions. 
It is clear from Eqs. 2.27-2.29 that such a procedure would in- 
crease informativeness of the observable. Simple prescriptions 
for that are considered in Sec. 2.52. The numerical effect here 
can be non-negligible (Sec. 2.62). 



Optimal observables and the % method 



2.41 



The popular % method makes a fit with a number of non- 
optimal observables (bins of a histogram). The histogramming 
implies a loss of information but the method is universal and 
implemented in standard software routines. On the other hand, 
the choice of / qu asi requires a problem-specific effort but then 
the loss of information can be made negligible by a suitable 
adjustment of / quas i. 

The balance is, as usual, between the quality of custom so- 
lutions and the readiness of universal ones. However, once 
quasi-optimal observables are found, the quality of maximum 
likelihood method seems to become available at a lower com- 
putational cost. 

The two methods are best regarded as complementary: One 
could first employ the % 2 method to verify the shape of the 
probability distribution and obtain the value of Mq to be used 
as a starting point in the method of quasi-optimal observables 
in order to obtain the best final estimate for M. 

A theoretical importance of the optimal observables is that 
the explicit (even if formal) expressions for optimal obser- 
vables (cf. 4.20) shed light on the problem of optimal con- 
struction of complex data processing algorithms (such as jet 
finding algorithms). The concept of optimal observables offers 
specific guidelines for construction and comparison of such al- 
gorithms by simply regarding them as a tool for construction of 
quasi-optimal observables. 



Example. The Breit-Wigner shape 



2.42 



Let P be random real numbers distributed according to 



7T(P) = 



kt (M-P) z +r z 



2.43 



There are two parameters here, and with more than one pa- 
rameter in the problem there are the usual ambiguities due to 



reparametrizations. However, one still can define an observ- 
able per each parameter according to 2.17. 
For 2.43, one obtains: 



/M,opt(P) = — lnff(P) = 



3M 



/r,o P ,(P) = ^ln^(P)^ 



2(M-P) 
(M-P) 2 + T 2 ' 

TlT 



(M-P) 2 +r 2 



2.44 



2.45 



(Recall that there is an arbitrariness in the definition of optimal 
observables as described by 2.16. The arbitrariness can be used 
to simplify and conveniently normalize the optimal observ- 
ables, as done in 2.45.) 

The above two weights happen to be uncorrelated: 

(/M,o P t/r,„pt) = 0. 2.46 

It is interesting to observe how/M -opt emphasizes contribu- 
tions of the slopes of the bump — exactly where the magnitude 
of tt(P) is most sensitive to variations of M — whereas taking 
contributions from the two slopes with a different sign maxi- 
mizes the sensitivity to the signal (i.e. information on M). At 
the same time it suppresses contributions from the middle part 
of the bump which generates mostly noise as far as M is con- 
cerned. 

• Unlike theoretical matrix elements which must include all 
known small corrections (cf. the programs for precision calcu- 
lations of LEP1 processes [13]), the observables such as 2.44, 
2.45 need not incorporate, say, loop corrections to T although 
inclusion of some such information might be useful (e.g. by 
introducing simple shapes via linear splines, etc.; cf. comments 
after 2.30). 

• Connection with the techniques of wavelets [12]. 

The form of 2.44 is reminiscent of a typical wavelet, which in- 
dicates that applying a wavelet filter to theoretical predictions 
and experimental formulas instead of the conventional binning 
prior to using the % 2 method would improve results. Since 
software implementations of the wavelet-based methods are 
available (e.g. on the Web), this could be a way to approach 
the quality of optimal observables via software routines as uni- 
versal as those implementing the % 2 method. 



Continuity of observables 



2.47 



To directly use the prescription 2.17 may not be possible 
because of insufficient information about n. On the other hand, 
it is reasonable to ask what are the general properties of opti- 
mal observables which ensure the best control of uncertainties. 
With such a knowledge one could ensure that the pragmatically 
constructed observables at least possess those properties. 
For instance, one could start with an ad hoc observable, iden- 
tify sources of its non-optimality (Eq.2.29 and remarks there- 
after) and modify the observable to mitigate their effect. 

From the above reasoning it follows that continuous observ- 
ables are less sensitive to statistical and detector errors. 

There are several reasons why one should prefer continuous 
observables: 

(i) Optimal observables f opt inherit continuity properties of 
7l(P). In the problems we consider the latter is always a con- 
tinuous function of the particles' parameters. 

(ii) The variance 2.10 is smaller for the more continuous and 
slower varying functions/(P). It tends to be larger for/(P) 
which have jumps or vary fast. 
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(iii) The error suppression effect of replacing a discontinuous 
observable by a continuous one (the so-called regularization) 
can be significant (Sec. 2.62). 

(iv) Fluctuations induced by detector errors (distortions of the 
individual events P; ) are best dampened in final results if the 
observables possess special continuity properties. Let us briefly 
discuss this. 



Taking into account detector errors 



2.48 



In reality the experimentally observed events P, contain 
distortions due to detector errors. This is expressed by a con- 
volution: 



7T(P) = _fdP'7T ideal (P')Z)(P\P). 



2.49 



where D(P',P) is the probability for the detector installation 

to see the event P if it is actually P'. Then the optimal observ- 
ables are built from 2.49 and must inherit the smearing in- 
duced by D . 

The difficulty here is that it may be hard to take into ac- 
count the exact form of D . Then the least one can say is that 
the smeared probability distribution 2.49 — and hence the op- 
timal observables — are continuous. 

This is not very informative in the simple case where P are, 
say, random points on the real axis. However, in high energy 
physics events P contain a fluctuating number of particles, 
each described by at least three numbers (energy, (p, 9), so that 
one deals with 0(1000) degrees of freedom, i.e. the dimen- 
sionality of the space of events is practically infinite. In infi- 
nitely-dimensional spaces radically different notions of conver- 
gence/continuity' are possible (cf. the uniform convergence and 
integral convergences such as L2 for functions on the real axis), 
and the significance of the different available options is often 
missed, so it may not be easy to make the correct choice. In jet- 
related problems, the relevant continuity is the so-called C- 
continuity (Sec. 3. 18). However, for practical purposes it may 
be useful to keep in mind the following rule of thumb: 

If continuity turns out to be important, then any (non- 
pathological) kind of continuity is better than step-like dis- 
continuities. 

2.50 

So, measurements based on conventional event selection pro- 
cedures can often be improved via replacements of hard cuts by 
continuously varying observables. The simplest prescription for 
that is described below (Sec. 2.52). It is rather universal and 
insensitive to the specific nature of events and cuts one deals 
with in a particular application. 



On the concept of regularization 



2.51 



Such a prescription is a special instance of the general con- 
cept of regularization (see [14] for a systematic treatment and 
history). A regularization is needed whenever there is a priori 
information about the exact solution (such as its continuity) 
which is not reflected in the approximations one's method 
yields. This can happen either when one uses crude heuristics 
(such as event selection procedures) or when one uses theoreti- 
cal methods which are likely to yield singular solutions (such 
as those encountered in pQCD; cf. the discussion around 
Eq.4.4). 



1 We use the terms convergence or continuity in place of the standard 
mathematical term topology as more suggestive and to avoid confusion 
with "topology of event". 



Generally speaking, the regularization is a projection of the 
candidate solutions to the subspace where the exact solution is 
supposed to reside. 

Regularizations may take very different forms depending on 
the specifics of the problem. One example is the Feier summa- 
tion of Fourier series for continuous functions. Here the algo- 
rithmic simplicity of the method of Fourier expansion comes 
into conflict with the continuity of the solution, and it proves 
easier first to sacrifice continuity in order to take advantage of 
the power of Fourier method, and then recur to a special trick 
such as the Feier resummation to ensure a uniform conver- 
gence to the continuous solution. 

Another example is the histogramming of events which, 
technically, is a transformation of a sum of 5-functions corre- 
sponding to individual events into an ordinary function. In this 
case only a singular approximation (a finite set of events) for a 
continuous function (the probability distribution) is provided 
by Nature as a matter of principle. 



Empirical regularization of cuts 



2.52 



The simplest procedure to transform a typical observable 
corresponding to an event selection procedure, Eq. 2.40, into a 
continuous function consists in replacing the step functions 
with simple piecewise linear continuous functions. The sim- 
plest way is to regularize each ©-function in 2.40 individually: 



/r eg ( p ) = ]-J ; 0reg ( p )x _ 



2.53 



This can be accomplished as follows. 

Each selection criterion in 2.40 can be reduced to the fol- 
lowing generic form 



<P(P) > c a 



2.54 



where the l.h.s. is a continuous function of the event. For in- 
stance, this could be a cut on the total energy of the observed 
event, then <p(P) is the total energy. 

Instead of a single parameter c cut , one now chooses a 
regularization interval specified by two values 

c lo < c cut < c hi ' r - c hi _c lo > ■ 2.55 

Here r is the so-called regularization parameter . 
The simplest option is a symmetric choice: 

r = 2( c cu t,hi ~~ c cut) = 2( c cu t ~~ c cut,lo) ■ 

One defines: 



2.56 





1 


if <P(P)>c hi , 




reg (p) = < 





if <P(P)<c l0 , 


2.57 




9>(P)-q 


if c 1 o<9(P)<c hi . 






r 





The linear form is chosen solely from considerations of sim- 
plicity. One could also use any other continuous (usually mo- 
notonic) shape which interpolates between the same values at 
the endpoints of the regularization interval. This is a useful 
option when r is large. 

For different selection criteria participating in 2.40, 2.53 the 
regularization interval and the shape of 8 mg can be chosen in- 
dependently. 

The most important parameter which controls the shape of 
reg and therefore suppression of errors is r. 
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In the context of jet-related measurements one aims to 
achieve C-continuity of observables. Then reg and/ reg would 
also be C-continuous if such is <p(P) in 2.54. 

• Warning It is possible, as a psychological crutch, to inter- 
pret the resulting weights as probabilities that the events carry 
the characteristics one uses for event selection. (Such an inter- 
pretation was mentioned in [4] in a footnote.) However, one 
should be explicitly warned against introducing a stochastic 
decision-making according to 2.57 (a procedure of this kind 
was suggested in [15]). Such additional stochasticity would 
only be an additional source of fluctuations and therefore in- 
crease variance (as can be easily verified in a formal manner), 
and thus not only defeat the purpose of regularization but exac- 
erbate the problem. 

Choosing the regularization interval 2.58 

A lower bound on useful values of r is set by detector er- 
rors. Let CTmeas be the usual sigma for the errors induced in the 
values of ^(P) by the distortions of P due to detector errors. 
The effect of error suppression is negligible unless 



analog). The suppression of relative errors is given by the ratio 
of the two factors cr, = ^/Var/ ; (/,}"' . One obtains: 



°j- = 1.7 forjr(P) = l; -^-=1.6 for 7t(P) = 2x. 2.63 



CT 2 



0- 2 



The effect of error suppression is significant here for all prob- 
ability distributions which can be approximated by linear poly- 
nomials — in fact significant enough to transform a 3(7 dis- 
crepancy into a 5(7 effect. 

Although in more complex cases the suppression effect may 
be less than in 2.63, the above numbers are not as far as one 
might suppose from reality. Indeed, it is in general possible to 
change variables appropriately and integrate out inessential 
components to reduce a generic multidimensional case to the 
one-dimensional. A realistic example is discussed in Sec. 4.68. 



r>2a„ 



2.59 



For larger r the suppression factor increases as 0(r" 2 ) (see 
sec. 2.6 of [4]). The suppression effect for statistical errors is 
also greater for larger r, and so r should be chosen as large as 
possible, in general. The best guidance here is Eq.2.17. A high 
precision in the choice of regularization interval is not re- 
quired. However, for large r one may wish to choose more 
complex shapes than 2.57 (e.g. consisting of several linear 
pieces glued together). 

The regularization interval may also be restricted by other 
considerations, especially for large r (e.g. onset of a different 
physical mechanism which causes a large background). In such 
cases one may opt for more asymmetry than in 2.56. 

Some idea about a potentially possible magnitude of sup- 
pression effects can be obtained from Sec. 2. 62. 



Algorithmic aspect 



2.60 



The simplest first step towards a systematic use of regulari- 
zation is to introduce a special 4-byte real field for the weight, 
for each event. The field is initialized to 1. As the event passes 
selection stages, the weight is modified according to 2.57 and 
2.53. If the weight becomes zero at some selection stage, the 
event is dropped as usual. In the end the selected events' 
weights are summed up instead of the simple counting of 
events. Similarly modified should be observables built from 
selected events: 



(/)reg = ^ 1 i:^/( P /)- 



2.61 



In particular, what used to be an event fraction now becomes 
N~'Y,Wj . The usual procedure corresponds to w; = 1. 

The algorithm described by 2.57 requires only a universal 
few-lines subroutine. 



Generic examples 



2.62 



Some idea about the effect of regularization on the sensi- 
tivity of observables to statistical errors is given by the fol- 
lowing one-dimensional examples. However, the conclusions 
have a more general validity (see below). 

Let P be a point from the segment [0, 1]. Compare 
/i(P) = 8{x > 1/2) (a hard cut) and / 2 (P) = x (a continuous 



More on regularization of cuts 



2.64 



It appears necessary to state that there is absolutely no physical 
wisdom in preferring event selection (equivalent to dichotomic ob- 
servables) over continuous weights. So, in view of the very general 
mechanism of error suppression in the case of continuous observ- 
ables and the simplicity of regularization prescriptions, one has to 
explain not why one should regularize the cuts but why one does not 
do so. 

Recall in this respect that the commonly used statistical methods 
such as histogramming originally emerged in the context of applica- 
tions such as demography and agriculture, not high-precision parti- 
cle physics experimentation where the proliferation of cuts in data 
processing elevates them to the level of a first-order algo- 
rithmic/mathematical phenomenon. For instance, the procedures for 
smoothing conventional histograms found in standard numerical 
packages are not the same as building histograms with regularized 
bins: the former entail a loss of numerical information, the latter en- 
hance it by suppressing errors. (See e.g. sec. 12.9 of [4]. A closely 
related mathematical techniques is the wavelet analysis [12].) 

Perhaps it ought to be considered an element of basic culture in 
data processing that an event should always be accompanied by a 
real weight. (There might be advantages in allowing the lower levels 
of the detector facility to yield events with weights not equal to 1 
from the very beginning.) Computer memory is cheap enough that 
extra four bytes per event should not be a burden, and one can al- 
ways revert to dichotomic weights — but one never quite knows 
what one looses precision-wise when one sequentially applies a 
dozen hard cuts to one's events loosing a few % of precision at each 
hard cut. Modest as the bang here may be, on a per buck basis it is 
certainly greater than with any hardware upgrade. 

The widely spread way of thinking in terms of "event selection" 
as a primary tool of data processing is based on a mental attitude 
which could be explained by: 

• The limitations of computer resources in the past — a factor 
which seems to be much alleviated thanks to Moore's law. 

• The fact that standard textbooks teach probability in the spirit of 
the Kolmogorov axiomatics in terms of subsets and the correspond- 
ing probabilities. For such axiomatics, the issues of continuity in the 
cases when random events occur in a continuum, are extraneous. 

• The penchant for thinking physics in terms of "regions of phase 
space" rather than continuously varying observables. This has some 
foundation in the cases when the events can be tagged somehow 
[e.g. in the case of (approximate) superselection rules] but not in 
QCD situations typically encountered in problems involving jet 
counting. Identifying the most interesting region of phase space is a 
useful heuristic but ought to be regarded as only a first step in the 
construction of the observable. 

Interestingly, a similar way of thinking in terms of "regions" 
proved to be detrimental for the theory of Feynman diagrams (see 
the comments in the E-print posting of [16]). Apparently, the asso- 
ciation of "regions" with "physics" is a piece of mythology deeply 
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rooted in the intellectual culture of high-energy physics community. 
It may perhaps be partly connected to a subconscious rejection of the 
quantum mechanical notion that it is impossible as a matter of prin- 
ciple to tell which hole an interfering electron passed through. 

This issue also seems to be psychologically related to the com- 
mon insistence that Monte Carlo event generators produce events 
with unit weights. Even if one uses event selection, the most basic 
observables are probabilities, so neither individual events nor their 
(integer) number but only event fractions are fundamental simplest 
estimators of the corresponding probabilities. But then it is totally 
irrelevant whether the estimate is obtained by counting events or 
their fractional weights — the result will be fractional anyway. 

Fractional weights accompanying events in the process of event 
selection would nicely mesh with other related experimental notions 
(such as the probabilities for a detected particle to be an electron) 
and with fractional weights accompanying MC-generated events. 
Theoretical estimates of the event fraction fluctuations for a given 
corner of phase space seem anyway to be best done by evaluating 
the variance of the corresponding weight function (because then ad- 
aptation techniques in integration routines may be used for greater 
computational efficiency). 

Note that (pseudo) events with fractional weights occur naturally 
when one attempts to restore the partonic event (see Section 9). This 
is in fact similar to how experimentalists restore observed events 
from detector signals. 

In short, the absolutization of event selection blinds one to some 
useful options in data processing. 

Observables in QCD. Kinematical aspects 3 

In the preceding section we have introduced the notion of 
(quasi-) optimal observables for precision measurements of 
fundamental parameters. Such observables allow one to ap- 
proach the theoretically possible precision for the parameters 
with a given event sample. We found that optimal observables 
are given by an explicit formula in terms of the probability 
density n(P) (Eq.2.17). In QCD, however, one may have a 
Monte Carlo event generator with a dependence on fundamen- 
tal parameters built in, but no algorithm to evaluate 7C(P) for a 
given event P. In such a situation it is reasonable to construct 
observables incrementally by combining as many properties 
from the optimal ones as possible. 

There are two types of such properties: kinematical and dy- 
namical. The kinematical properties reflect requirements of 
two kinds: experimental (appropriate continuity to suppress 
sensitivity to statistical fluctuations and detector errors in data) 
and theoretical (conformance to structural properties of quan- 
tum field theory in general and QCD in particular, in order to 
enhance quality of theoretical predictions). Dynamical proper- 
ties reflect the specific behavior of x(P) such as predominant 
production of certain types of events. In general, the result 2.17 
incorporates both kinematical and dynamical restrictions, with 
the former playing the role of a fine-tuning for the latter. How- 
ever, the specifics of QCD dynamics (a fast variation of 7T(P) 
between the points in the space of P which are close in the 
sense of C-continuity; see [4]) enhances the role of kinematical 
considerations (see the example in Sec. 4.68). 

A systematic study of QCD observables from a kinematical 
viewpoint (continuity and sensitivity to errors, and compati- 
bility with quantum field theory) was performed in [3] -[5]. 
In this section we review the findings of [3]-[5]. 
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Notations. Representations of events 3.1 

The beam axis is z and it corresponds to the 3 rd component 
of 3-vectors. The polar angle 6 is measured from the beam 
axis, and the azimuthal angle <p is defined accordingly. 

Let E cA be the calorimetric energy — the number meas- 
ured by a calorimetric cell. It is usually interpreted as the time- 
like component of the 4-momentum of the particle which hit 
the cell. It is assumed sufficient to treat all particles as 
massless, so that their energies are not distinguished from ab- 
solute values of their 3-momenta. 

In jet studies one deals with two physical situations in 
which slightly different kinematical aspects are emphasized. 
This is reflected in how jets are looked at: 

When studying processes with c.m.s. jet production (mostly 
e + e~ annihilation), spherical symmetry is emphasized, and so 
one works within spherical kinematics, dealing with points of 
unit sphere represented either by the pair of angles 6, (p or by 
unit 3-vectors denoted as p, q , etc. 

When studying hadron collisions, the colliding partons' rest 
frame is unknown so that invariance with respect to boosts 
along the beam axis has to be maintained. Then one works 
within cylindrical kinematics and introduces the so-called 

transverse energy 

£± = £cai sine S J~2 + p 2 _ 3 2 

and pseudorapidity 

7j = lncot(0/2), -oo<tj<+oo. 3.3 

Then a massless 4-momentum p = (E C3i ,p l ,p 2 ,p i ) is repre- 
sented as 

p = E ± (coshri, cos?), sincp, sinhr;). 3.4 

Boosts along the beam axis correspond to shifts of r} . 

Particles and events 3.5 

Let P be the event as seen by an ideal calorimetric detector 
installation. Then P is a collection of "particles" which can be 
just calorimetric cells lit up by the event. Particles in the event 
will be enumerated using the labels a, b. The a-th particle/cell 
is represented by its energy E a and direction p a . Formally: 

P = {E a ,p a } „ , 3.6 

" ' ral a=l...N(P) 

where N(P) is the total number of particles in P. 

It is convenient to allow particles with zero energy in 3.6. 
This corresponds to the fact that a low-energy particle may not 
lit up the cell it hits. 

In what follows we will be talking about partonic events, 
hadronic events, jet configurations, etc. They are all objects of 
the same type 3.6. 

The meaning of the energies E a depends on the chosen 
kinematics: 

E™ 1 spherical kinematics; 
E a =\ a P 3.7 

[E£ cylindrical kinematics. 



The directions p can be represented in different ways (e.g. 
by (p and 8; by a unit 3-vector; etc.), but all the reasoning until 
Sec. 7 is independent of the representation. All we need is the 
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usual angular distance between two directions, \p -q I , which 
is defined unambiguously. 

• For definiteness, we will always be talking about spherical 
kinematics in what follows. Then dp is an infinitesimal ele- 
ment of the surface of the unit sphere. The final prescriptions 
for jet definition will be formulated independently of this as- 
sumption. 

Events as measures 3.8 

We are actually interested in events as seen by a purely 
calorimetric detector installation, i.e. energy flows. Energy 
flow is insensitive to fragmentations of any particle of the 
event into any number of collinear fragments directed the same 
as the parent particle and carrying the same total energy. How- 
ever, the representation 3.6 is defective in this respect in that it 
is not explicitly fragmentation-invariant. 

The following representation of events-as-energy-flows was 
found to respect physical requirements to maximal degree (see 
[4] and the reasoning below): 



Z a E a S(P,Pa) = P(P)- 



3.9 



Here the 5-functions obey the usual rules of integration over 
the unit sphere: 



f ., . <^S(p,p a )d(p) = d(p a ) 

J unit sphere 



3.10 



Junk sphere 

for any continuous function on the unit sphere d(p) . 

In mathematical terms, the object 3.9 is a measure on the 
unit sphere. By definition, it acquires a numerical meaning af- 
ter integrations with continuous functions: 



(P,d)=\ dp P(p) d(p) = V E d{p ) . 

\ ' Junit sphere^ ^ ^ a ^ a 



3.11 



In other words, Eq. 3.9 is essentially a convenient shorthand 
notation for the collection of values 3.11 for all such d(p) : 



d(p) are all continuous functions on unit sphere 



3.12 



• The expression 3.9 is explicitly fragmentation invariant, as 
are Eq.3.11 and the r.h.s. of 3.12. 



Calorimetric detector cells 



3.13 



Elementary calorimetric cells are naturally represented by 
d{p) corresponding to their idealized angular acceptance 

functions: such d(p) takes the value 1 inside some small an- 
gular region, and continuously falls off to zero outside that re- 
gion, so that if E, p are the particle's energy and direction 

then the energy detected by the cell is Ed(p) (the closer to 

the cell's boundary the particle hits the cell, the less the frac- 
tion of the energy registered by the cell). Then the energy 
which the cell d sees when confronted with the event P is 
given by 3.11. 

In view of this interpretation, it becomes physically trans- 
parent why the event-as-energy-flow is equivalent to the col- 
lection of values 3.12. In practice one deals with a finite col- 
lection of calorimetric modules d a , and with the corresponding 
finite collection of numbers d a { P) for each event P. These 
numbers constitute the experimentally measured approxima- 
tion to the ideal information content of P: 



pexp 



{(**.)}. 



3.14 



If the angular size of d a (= the size of the angular region in 
which d a * ) is sufficiently small, d a is represented by a di- 
rection p a , and we come back to 3.6. 

This proves that the representation of even-as-energy-flow 
by the collection 3.12 is equivalent to the conventional 
"particle" representation 3.6 — with one important improve- 
ment: unlike the numbers which constitute 3.6, each number in 
the collection 3.12 is fragmentation invariant. 

• In what follows we will interpret events in the sense of 3.9 
and 3.12 (the latter is just a shorthand notation for the former), 
treating the representation 3.6 as a bookkeeping device. 



The domain <£ 



3.15 



It is convenient to impose the following restriction on 
events: 



I— la a 



3.16 



This is because the events' energies are bounded by a constant 
in any experiment, and the structure of energy flow is inde- 
pendent of the event's total energy, at the basic level of so- 
phistication. 

It would be sufficient to have the equality in the above re- 
striction. The inequality is allowed only because of the formal 
convenience resulting from the use of the linear structure in the 
space of events represented as measures 3.9. 

The following collection of events will be the arena of much 
of the subsequent mathematical action: 



fP = all events P which satisfy Eq.3.16. 



3.17 



C-continuous observables 



3.18 



We are dealing with observables f(P) defined on events P 
from the domain <B. We saw in Sec. 2.48 that smearings due to 
detector errors cause the probability distribution of observed 
events and, therefore, optimal observables 2.17 to possess spe- 
cial continuity properties which we are now going to study. 

Note that the same notion of C-continuous observables will 
reemerge from analysis of predictive power of pQCD 
(see comments after 4.2). This is because the choice of calo- 
rimetric detectors for measurements is determined by the 
limitations of predictive power of pQCD in regard of hadronic 
events [8]. Therefore, C-continuity is a fundamental notion in 
the theory of jet observables. 

Before we turn to precise formulations, note the following. 

Any function f(P) when considered on events with exactly 
N particles, becomes an ordinary function of N composite ar- 
guments: 



f(P)^f m {{E 1 ,p 1 },...AE N J N }). 



3.19 



Then /(P) as a whole is a sequence of such component func- 
tions /«, N= 1, . . . oo . Such a representation in terms of compo- 
nent functions is natural from the viewpoint of perturbative 
QCD where one deals with a small number of particles in each 
order of perturbation theory (cf. [17]). 

However, similarly to how the naive representation of 
events 3.6 is insufficient in that it is not explicitly fragmenta- 
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tion invariant and thus potentially misleading in the construc- 
tion of data processing algorithms, so the corresponding repre- 
sentation of observables 3.19 may also be insufficient. 
In particular, it would be hard to formulate C-continuity in 
terms of 3.19. 

C-convergence of events 3.20 

To define continuity of a function /(P) one first has to es- 
tablish the notion of convergence of its arguments, in our case 
the events P. The issue is non-trivial here because our events P 
run over the infinitely-dimensional domain <S , and in infinitely- 
dimensional spaces many radically non-equivalent notions of 
convergence are possible. So, when does a sequence of events 
P„ converge to an event P? 

For instance, a naive convergence defined on the basis of 
3.6 would be to require convergence of all numerical 
"components" of 3.6. Namely, one would require that 
N{P n ) ->iV(P) , which would mean that N(P n ) = N(P) for all 
sufficiently large n . Then one would require that the energy 
and direction of each of the particle from P„ converged to the 
energy and direction of some particle in P. However, this is 
clearly inadequate because an event consisting of one narrow 
cluster of particles which gets narrower as n->» may con- 
verge in an intuitive physical sense to a one-particle event even 
if the distribution of energies between particles in P„ wildly 
fluctuates with changing n . 

To obtain a correct answer one should realize that conver- 
gences such as the one being discussed are simply a mathe- 
matical way to describe the general structure of one's meas- 
urement devices, so that the corresponding continuity of ob- 
servables would ensure their stability with respect to detector 
errors. 

In our case the correct choice is the so-called 
C-convergence . s ' h Its definition is directly connected to how 
calorimetric detector cells see events: 



The sequence of events P„ is said to C-converge to P if P„ 
in the limit of n — > °° become indistinguishable from P for any 
calorimetric detector cell d, i.e. 

(P n ,d)->(P,d) 3.21 

in the usual numerical sense for each continuous function 
d{p) defined on the unit sphere. 



One could use here special d corresponding to realistic de- 
tector cells and described in Sec. 3. 13 but the extension by 
linearity to arbitrary continuous functions is convenient and 
does neither restrict nor relax the definition. 

The convergence 3.21 can be described in a more conven- 
tional fashion using an appropriately chosen measure of dis- 
tance between events (Sec. 3.23). 

Formulation in terms of open sets 3.22 

The above formulation is equivalent to the following one 
phrased in a canonical mathematical language. For simplicity 
we ignore statistical fluctuations of the errors; our purpose is 
only to show how the basic structure of detector errors uniquely 
determines the topology (convergence) in the space of events. 



g C from "calorimetric"; we will also use the verb to C-converge, etc. 

11 In terms of pure mathematics, the C-convergence is an instance of the so- 
called *-weak topology in the space of linear functionals [18]. 
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An elementary measurement device a (a calorimetric cell in 
our case) yields a non-empty interval of real numbers (r\ r") 
for each instance of measurement. Consider the subset of all 
events which could have produced the same interval (which 
without loss of generality can be taken to be open) and denote 
it O a yy. 

A complex detector installation consists of a finite number 
of such devices, and each instance of measurements actually 
registers a subset of events which corresponds to the intersec- 
tion of O a yy for all elementary devices which constitute the 
detector. 

The sets Oayy constitute the so-called subbase which 
uniquely determines a topology in the space of events. 

The convergence described by 3.21 is equivalent to the to- 
pology obtained in this way for elementary measurement de- 
vices described by 3.11 (cf. Sec. 3.13). 

Distance to quantify similarity of events 3.23 

It may be helpful to point out a single numeric measure of 
distance between events P which would correspond to C-con- 
vergence. The distance is fully constructive (although a bit 
cumbersome) and corresponds to the intuitive notion of simi- 
larity of two events at various angular resolutions. 

Define: 

(exp -xT 1 for x < 1 , 
L J 3.24 
forx>l; 

d R , i (p) = w{e i> , i IR)\ 3.25 

This describes an ideal calorimetric cell of radius R centered at 
q . (It would have been sufficient for each R to restrict q to a 
finite grid of points so that each point of the unit sphere is no 
farther than R/2 from the nearest point of the grid.) 

The following expression is interpreted as the distance be- 
tween P and Q at the angular resolution R : 

dist fl (P,Q) = max^ \(P,d R4 )-(Q,d Rtf )\ 

= max s |(P-Q,d^)|. 3.26 

It is bounded by 1 if both events belong to 1 . 

To obtain a measure of distance for all angular resolutions, 
simply take a sum over increasingly better resolutions R„ — > 0: 

Dist(P,Q) = £ n=12 ^dirtJP.Q). 3.27 



The sequence R n is otherwise arbitrary, e.g. R„ = 2 " . 

The sum of positive coefficients £„ must be finite. We nor- 
malize them so that 

I»=1.2...f- =1 - 3 - 28 

This ensures the following normalization of Dist: 
Dist(P,Q) < 1 for any P and Q from fP. 3.29 

Verbally, each next term in the sum 3.27 describes the dif- 
ference between P and Q at a higher angular resolution. The 
rate of decrease of f„ as m— >°o controls sensitivity of 3.27 to 
the differences between P and Q at higher angular resolutions. 
For instance, f „ = T n . 
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The decreasing sensitivity of the expression 3.27 to correla- 
tions between P and Q at smaller angular distances nicely re- 
flects the decreasing physical importance of such correlations. 

The usual definition of convergent sequences based on this 
; measure of distance in T , 

Dist(P n ,P)^0, 3.30 

j is equivalent to the C-convergence, Eq. 3.21 . 



• The above definition of Dist resembles constructions of the 
wavelet analysis [12] with yr( x ) corresponding to the mother 
wavelet. This is not the only place where the logical patterns of 
the wavelet analysis come to the surface in our theory (cf. the 
comments after 2.46). 

• A mathematician would note that the closure of <S is com- 
pact with respect to the C-convergence. This is a special case 
of the Banach-Alaoglu theorem [18]. This is important for the 
study of the structure of C-continuous observables (Sec. 3.35). 

• Although one may be psychologically more comfortable 
with the definition of convergence in the space of events in 
terms of a single numeric measure of distance 3.30 rather than 
the seemingly more amorphous definition 3.21, the latter is 
deeper and is actually simpler. The possibility to express the 
convergence in terms of one distance 3.30 is accidental and its 
form exhibits too many inessential details. Eq. 3.21, on the 
other hand, goes to the heart of the matter by directly reflecting 
the structure of multimodule detectors and leading to the pro- 
found identification 3.43. The usefulness of the entire logical 
pattern rooted in the definition 3.21 is demonstrated by the 
derivation of jet definition in Section 6 — it is not clear what 
heuristics one would have been guided by should one decide to 
work in terms of the distance 3.30. 

C-continuity of observables 3.31 

The formal definition is as follows: 



An observable /(P) defined on events from 2> is C- 
continuous if 

/(P„)->/(P) 3.32 
whenever P„— » P in the sense of C-convergence (3.21 or 3.30). 

Qualitatively, C-continuity is the same as stability with re- 
spect to distortions of energy flow deemed physically less sig- 
nificant in jet-related measurements (such as due to minor re- 
arrangements of detector cells, several particles hitting the 
same cell, detector errors, etc.). Such distortions may cause the 
numbers which constitute 3.6 (e.g. the observed number of 
particles) to jump erratically, whereas the values of C-con- 
tinuous shape observables would exhibit continuous variations. 

C-continuity and fragmentation invariance 3.33 

Since the definition of C-convergence is entirely in terms of 
the fragmentation-invariant representation of events 3.9, a 
function /(P) that is C-continuous is automatically fragmenta- 
tion invariant (if Q differs from P by exactly collinear frag- 
mentations then Dist(Q, P) =0, so Eq. 3.32 implies that 
/(Q)=/(P)). 

Furthermore, each of the component functions/^ (see 3.19) 
is continuous in all its arguments. However, the latter property 
is sufficient to ensure C-continuity of /(P) (see sec. 6.9 of 



[4]). This is essentially because C-continuity imposes restric- 
tions on allowed rate of variation of simultaneously all compo- 
nent functions f N . From the viewpoint of pQCD, such a re- 
quirement connects all orders of perturbation theory, and there- 
fore is inherently non-perturbative. 



C-continuity is a combination of fragmentation invariance and a 
special continuity in particles' parameters, formulated without refer- 
ence to the structure of perturbative partonic states. 

3.34 



The usual shape observables such as thrust and the jet num- 
ber discriminators (as well as the classes of observables de- 
scribed in [4]) are C-continuous whereas the thrust axis is not. 
Nor are C-continuous the number of jets and individual jets pa- 
rameters — irrespective of the jet definition adopted. 
On the other hand, the prescriptions of Section 9 eliminate 
(most) C-discontinuities from observables constructed on the 
basis of jet configurations found by the optimal jet definition 
introduced in this paper. 

• Concerning the regularization prescriptions of Sec. 2.52, we 
note that if the l.h.s. of 2.54 is C-continuous (which is often the 
case in practical situations) then such is 2.57. 

Structure of the space of C-continuous 

functions 3.35 



The simplest example of C-continuous functions is immedi- 
ately deduced from the definitions 3.32 and 3.21. Suppose 
f(p) is continuous everywhere on the unit sphere. Then the 

function /(P) defined on events' according to 

/(P) = (P,/) = £ a £„/(/>„). 3.36 

(cf. 3.11) is C-continuous by definition. Such/(P) will be 
called basic shape observables . They will be further discussed 
in Sec. 3.43. 

Furthermore, arbitrary C-continuous functions can be ap- 
proximated by algebraic combinations of the basic shape ob- 
servables in a fashion similar to how arbitrary continuous 
functions on, say, unit cube can be approximated by ordinary 
polynomials^ This analogy is illustrated by the following table: 



vector P = (Pi,...) 


event P 


unit cube 0<P, <1 


j the domain i (3.17) 


continuity 


C-continuity 


linear functions 


basic shape observables 


£,-r ; P, 


(Eq.3.36) 


j products of linear func- 


(multi-)energy correlators 


tions (monomials) 


(Eq.3.40) 


continuous functions 


j C-continuous observables /(P) 


/(P) 


j (generalized shape observables) 



3.37 



1 Note a convenient abuse of notation: both the angular function and the 
corresponding shape observable are denoted by the same symbol /. Inter- 
pretation depends on the type of arguments. 

1 A well-known theorem due to Weierstrass. Its generalization needed for 
our purposes is known as the Stone-Weierstrass theorem [18]. A mathema- 
tician would easily supply the details which physicists, however, won't 
care about because they don't lead to useful algorithms. 
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The approximation meant here is in the usual uniform 
sense, i.e. for any e > 0, an arbitrary C-continuous function 
/( P) can be approximated by a sum of energy correlators /'( P) 
so that: 

sup |/(P)-/'(P)| < e. 3.38 

Pe3> 

The two classes of observables shown in the right column 
(basic shape observables and energy correlators) play special 
roles from the viewpoint of the underlying physical formalism. 
We have already seen that the basic shape observables 3.36 are 
singled out by their relation to the structure of elementary de- 
tector modules (we will return to this in Sec. 3.42). Let us now 
discuss the energy correlators. 

Energy correlators 3.39 

These have the form 

/( P ) = E ai ... a „ E av E a n fniPa, ,-,Pa n ) . 3.40 

where /„ is a symmetric continuous function of n arguments. 
Basic shape observables are special cases corresponding to 
n = \. 

The component functions 3.19 and the correlators 3.40 can 
be regarded as different bases in terms of which to express 
general C-continuous observables. One function /„ in 3.40 cor- 
responds to an infinite sequence of component functions 3.19. 
On the other hand, Eq. 3.40, unlike 3.19, is automatically 
fragmentation invariant. 

Furthermore, the correlators 3.40 are singled out for two 
theoretical reasons which reflect the fundamental structures of, 
respectively, quantum field theory and QCD. This has far 
reaching consequences. 

First, such correlators naturally fit into the general structure 
of quantum field theory where the apparatus of multiparticle 
correlators is intimately related to the fundamental formalism 
of Fock space and is central in quantum field theory and statis- 
tical mechanics [19] because it allows one to systematically de- 
scribe systems with a fluctuating number of particles (as is the 
case e.g. with multiparticle events in high-energy physics 
experiments). 

Second, the energy correlators 3.40 are directly expressed in 
terms of the fundamental energy-momentum tensor [5] (we do 
not need explicit expressions here). This allows one to directly 
address the well-known problem that predictions of pQCD are 
formulated in terms of quark and gluon fields whereas experi- 
mental data deal with the observed hadronic degrees of free- 
dom. Indeed, the energy-momentum tensor is determined solely 
by the space-time symmetries of QCD. It is thus independent of 
a particular operator basis used to represent the theory (quark 
and gluon fields, or hadronic fields) and so absorbs all the un- 
known complexity of confinement and hadronization. There- 
fore, observables which are expressible in terms of the energy- 
momentum tensor can be computed either in terms of hadronic 
degrees of freedom or from perturbative quarks and gluons. For 
such observables, the criterion of infrared safety (cancellation 
of singular logarithms, etc.) reduces to verification of existence 
of the energy-momentum tensor in QCD as an operator object. 
We will return to this in Sec. 4.1, and here only note that the 
described way of reasoning clarifies the conjecture of [8] that 
observables for which pQCD predictions make sense are those 
for which infrared and collinear singularities cancel thus en- 
suring their insensitivity to non-perturbative physics. Such a 
cancellation is guaranteed for the energy correlators 3.40 



(provided the angular function /„ satisfies some additional 
regularity restrictions; see Sec. 4.1) whereas the general C- 
continuous functions are approximated by sums of energy cor- 
relators in such a way that the properties of fragmentation in- 
variance etc. required for such cancellations are not compro- 
mised. 

The direct connections of the energy correlators 3.40 with 
QFT and QCD reflect the physical nature of the phenomena 
concerned and ensure their superior amenability to theoretical 
studies (cf. the abundance and quality of theoretical calcula- 
tions for the simplest shape observable thrust [20] and 
the method of a systematic study of power corrections outlined 
in [16]). 

Generalized shape observables 3.41 

A few comments are in order concerning generalized shape 
observables. These are essentially arbitrary C-continuous func- 
tions. They are obtained from energy correlators using alge- 
braic operations and appropriate limiting procedures which do 
not violate the property of C-continuity (theoretically, it is suf- 
ficient to ensure a uniform convergence on fp in the sense of 
3.38). Roughly speaking, such operations are applied to ob- 
servables as a whole (i.e. after averaging over all events) and 
they should not allow arbitrary growth of the rate of variation 
of the component functions 3.19 for N— » °o . An example of a 
correct limiting procedure is the minimization over the thrust 
direction involved in the definition of thrust; cf. 4.68. For an 
example of illegal sum see sec. 6.9 of [4]. 

It is clear a priori that if quantum field theory is a funda- 
mental mechanics governing the phenomena observed in high 
energy physics, then it should be possible to express any truly 
observable phenomena (unlike artifacts such as instabilities) in 
a QFT-compatible language, i.e. via observables that can be 
approximated by energy correlators. This was the original ra- 
tionale behind the theory of [3]-[5]. 

Unfortunately, even in one dimension simplest polynomial 
approximations in the spirit of the Weierstrass theorem 
(polynomial interpolation formulas) are seldom sufficient: 
spline approximations build by gluing local polynomials are 
generally more useful. This is even more so in infinitely many 
dimensions (as is the case with 2> ), whence the need for spe- 
cial tricks such as jet algorithms. An array of prescriptions al- 
lowing to simulate conventional jet-based observables such as 
dijet mass distributions in the language of C-continuous obser- 
vables was described in [4]. Such prescriptions altogether cir- 
cumvent representation of events in terms of jets. 

However, the purpose of the examples presented in [4] was 
primarily to demonstrate mathematical mechanisms ensuring 
that information extracted via generalized shape observables 
contains features that can be directly related to the conven- 
tional procedures (such as 5-spikes in the so-called spectral 
discriminators corresponding to multi-jet substates). For prac- 
tical purposes, it may be more convenient to start from the 
conventional observables and try to eliminate C-discontinuities 
which spoil optimality of observables. Prescriptions for doing 
so are described later on in this paper. 

In what follows we will be using the term C-continuous ob- 
servables as less ambiguous than generalized shape observ- 
ables. 
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Basic shape observables and physical 

information 3.42 



Ideal physical information content of the event P is identified with 
the collection of values of all basic shape observables, i.e. with the 
r.h.s. of Eq.3.12. 

3.43 



The adjective "ideal" reminds us that in practice only a fi- 
nite subset of the collection is used, as in 3.14. One should feel 
no more psychological discomfort with such a collection than 



with, say, a transcendental number such as n which is com- 
pletely specified by an infinite number of digits but in practice 
represented by their finite sequences. 

By identifying the information content of the event P with 
the collection of expressions 3.36, not only have we not devi- 
ated from the experimental reality but we have actually re- 
turned closer to it compared with the r.h.s. of 3.6 (if only by 
allowing finite angular resolutions), at the same time giving it 
a systematic form which is convenient for the derivation of the 
jet definition (Section 5). 

Observables in QCD. Dynamical aspects 4 

Now we turn to dynamical (i.e. QCD-specific) considera- 
tions in the construction of optimal observables according to 
Eq.2.17 in the context of hadronic events produced in high en- 
ergy physics experiments. The big problem is that such events 
contain 0(100) particles described by 3 degrees of freedom 
each. On the other hand, the underlying physics is controlled 
by a few Standard Model parameters, whereas all the com- 
plexity of hadronic events is supposed to be generated by the 
QCD Lagrangian that contains only one coupling as and quark 
and gluon fields most of which can be regarded as massless. 
This means that from the viewpoint of studies of both the 
Standard Model and QCD Lagrangian most of the observed 
degrees of freedom are physically not important. In the lan- 
guage of the theory of optimal observables (Sec. 2.7), one could 
say that the optimal observables for extraction of the Standard 
Model parameters are mostly sensitive to a few degrees of 
freedom which the conventional wisdom identifies with the 
representation of events in terms of jets. 

The chain of reasoning presented below is intended to make 
more explicit, and thus help to clarify the argumentation of the 
theory of jets including the part about inversion of hadroniza- 
tion. Much of the argumentation is familiar but phrased in a 
more formal language to facilitate a systematic investigation. 

Since we are interested in issues such as hadronization, the 
perturbation theory discussed below only concerns QCD. Elec- 
troweak effects are assumed to be taken into account in the 
theoretical amplitudes as necessary. 

The basic conjecture of the QCD theory of jets 4. 1 



The conjecture of Sterman and Weinberg [8] is that the 
property of infrared safety of observables ensure their calcula- 
bility within the framework of pQCD. We would like to ex- 
press it in a formal fashion and to connect the notion of IR 
safety with C-continuity (Sec. 3.31). 

In the final respect, one needs (quasi-) optimal observables 
(Sec. 2.25) to extract the values of fundamental parameters 
such as the mass of the W boson from hadronic data. Because 
of a large dimensionality of observed hadronic events P a one 
needs some specific structural information about the ideal 
probability density 7l(P) of their production (ideal = not taking 
into account detector errors). Such information is obtained 
from pQCD which deals, however, not with hadronic but quark 
and gluon degrees of freedom. 

The conclusions of [8], [5] (also see Sec. 3.39 above) can be 
summarized as follows. For any C-continuous observable /(P) 
it is correct to compute theoretical predictions within the 
framework of pQCD. Formally: 



So, one can represent an event either in terms of particles, 
Eq. 3.6, or in terms of values of basic shape observables, cf. 
3.12. In the absence of detector errors and other imperfections, 
the two representations are numerically equivalent. 

However, the structure of detector errors is an essential part 
of physics, and in this respect the two representations differ: 
the numbers which constitute the r.h.s. of 3.12 individually 
possess the correct stability properties with respect to small 
distortions of the event — distortions of the kind specific to the 
type of measurements we deal with. The numbers which con- 
stitute Eq. 3.6, on the contrary, do not possess this property. 

For clarity's sake, suppose the event has two sufficiently 
energetic particles a and b whose directions are close. Then 
replacing the pair a, b with one particle c whose 3-momentum 
is the sum of the 3-momenta of a and b is deemed to distort 
the calorimetric physical information carried by the event only 
a little (the less the difference between p a and p h , the less 

the distortion). The individual numbers which constitute the 
r.h.s. of 3.6 do not have this property: they can exhibit non- 
negligible chaotic fluctuations even if the physical information 
content of the event varies negligibly. 

A simple analogy may further help to understand the role of 
continuity: imagine a ruler marked randomly instead of the 
standard ordered numbering. Representing length by using a 
number obtained from such a ruler would be not dissimilar to 
representing the event via 3.6: it would be sufficient for book- 
keeping purposes, but it would require great care in construc- 
tion of data processing algorithms such as computation of vol- 
umes, prices, etc. 

Similarly, whereas the representation 3.6 is convenient for 
book-keeping purposes, one should avoid relying on its form in 
the design of data processing algorithms. Such algorithms 
should in general respect additional restrictions not reflected in 
3.6, namely, the restriction of C-continuity. The difficulties en- 
countered by the experts in jet definition (such as a lack of 
fragmentation invariance of some suggestions related to jet al- 
gorithms) are often artifacts due to a failure to reason about 
jets and energy flows in terms which correctly reflect the 
physical nature of the problem. 

Note in this respect that all the seemingly abstract notions 
which we introduced (events as measures on the unit sphere, 
C-continuity, etc.) are essentially only notations, i.e. formulaic 
expressions of what is. 

In fact, these notions are neither more abstract nor difficult 
than, say, the differential calculus. But they are usually taught 
as "advanced" topics in the abstract courses of functional 
analysis without link to applications, which earns them a bad 
reputation among physicists. Then when these notions are ac- 
tually encountered there is a psychological tendency to reject 
them as too abstract to be useful in practical physics. 

We are ready to take a philosophical look at Eq. 3.12: 



F.V.Tkachov hep-ph/9901444 [2 nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 16 of 45 



JdP«(P)/(P) = Jdpw$^(p)/(p) + 0(a# +1 ). 



4.2 



Here 7t(P) represents the exact probability density so that the l.h.s. 
is what experiments would see given ideal detectors and infinite sta- 
tistics. The variable p on the r.h.s. represents perturbative quark 
and gluon final states. JtIqcd is tne corresponding probability den- 
sity computed within the shown precision in as from pQCD; it is a 
sum of contributions proportional to a n s , n = 0,...,N. 



(i) The mathematical structure of events in the two expres- 
sions is essentially the same from the viewpoint of data proc- 
essing (Eq. 3.6), the difference being in the number of particles 
(O(100)forPandO(l)forp). 

(ii) The restriction of C-continuity is important for the valid- 
ity of 4.2 in so far as fragmentation invariance and regularity 
properties (a continuity and related stronger regularity restric- 
tions; cf. Sec. 4.3) have to be formulated non-perturbatively for 
the non-perturbative expression on the l.h.s. 

(iii) Quark-hadron duality. The proposition that it 

is possible to replace a sum over hadronic states by the corre- 
sponding partonic sum is known as the hypothesis of quark- 
hadron duality. The scenario of derivation of 4.2 described in 
Sec. 3.39 (see also Sec. 4.3) circumvents such direct replace- 
ment via an intermediate representation in which only an aver- 
age over the initial state of a product of energy-momentum ten- 
sor densities is involved. 

However, there is also the dynamical aspect, namely, that 
the perturbation theory would actually work in a numerically 
satisfactory fashion. This cannot be explained by reference to 
the energy-momentum tensor per se but is made possible by 
such a representation as it allows application of the usual 
renormalization group argument exactly as in the case of total 
cross sections. 

Still, a reference to renormalization group is insufficient in- 
asmuch as the convergence of the expansion on the r.h.s. of 4.2 
depends on the behavior of the observable/. The easiest way 
to see this effect is by looking at the so-called power- 
suppressed corrections that are parametrized in terms of coeffi- 
cients not predictable from perturbation theory . k A little expe- 
rience with asymptotic expansions of integrals of perturbation 
theory 1 makes it obvious that such corrections are proportional 
to angular derivatives of the observables / in 4.2: for / which 
vary too fast at too many points of the phase space the pertur- 
bative expansion would not work. This confirms the notion that 
perturbation theory cannot predict small-scale angular correla- 
tions in observed events. 



A scenario of formal verification of Eq. 4.2 



4.3 



(Readers not interested in formal aspects may skip the technical 
details below and go directly to Sec.4.9.) 

As long as the construction of perturbation theory is performed in 
an axiomatic fashion rather than derived from non-perturbatively 



Cf. the studies of such corrections in the theory of QCD sum rules [21]. 

1 Cf. a systematic scenario described in [16] based on the expansion 
method of the so-called asymptotic operation [22], [23] which is directly 
formulated in terms of 5-functional counterterms, so that corrections sup- 
pressed by powers of the total energy involve derivatives of 5-functions 
(power counting mechanisms ensure that higher power-suppressed correc- 
tions are accompanied by more derivatives on 5-functions). After integra- 
tions with /, the derivatives are switched from 5-functions to /. 



formulated fundamental equations (as is done in conventional treat- 
ments [19]), any proof of 4.2 is bound to be only a more or less 
plausible scenario because the non-perturbative l.h.s. is, essentially, 
a theoretical fiction. So if it were possible to accurately verify 4.2, 
one would have to do so roughly as follows. 

The first step would be to establish 4.2 for/ that are energy cor- 
relators, Eq.3.40. One would start with a non-perturbatively defined 
expression (the l.h.s. of 4.2), represent it in terms of correlators of 
the energy-momentum tensor densities as explained in [5], and then 
develop an expansion in C6 S , ending up with the r.h.s. of 4.2. 

A technical subtlety is that owing to the singularities of pQCD 
the integrals on the r.h.s. of 4.2 are well defined only for / that obey 
somewhat stronger regularity restrictions than a mere continuity. 

The simplest illustration for this can be borrowed from pQCD 
where a typical object in theoretical answers is the so-called +- 

distribution, (l-x)" 1 (see e.g. [24]; here x is the parton fraction but 

the analytical mechanism being demonstrated is completely general). 
This distribution is defined by its integration properties: 

fi f(x)-fd) 



fl i fl fix)- 

J o dx(i-x); 1 /W = J o dx^- 



4.4 



For the r.h.s. to be a well-defined integral, it is not sufficient that 
/(x) is merely continuous, i.e./(x) — > /(l); one must also assume 
that f(x) approaches /(l) sufficiently fast, e.g. 

/(x)-/(l) = 0(ll-xl). 4.5 

This is satisfied e.g. if / has continuous first derivatives. 

The technical regularity restrictions on observables /(P) required 
for the r.h.s. of 4.2 to be well-defined are multi-dimensional analogs 
of the restriction 4.5. For practical purposes it is sufficient to require 
e.g. that the angular functions /„ in 3.40 have continuous first de- 
rivatives. 111 (Ref. [17] formulated the restrictions in a slightly more 
general form of the Holder condition — but in the language of the 
component functions 3.19.) 

That this regularity restriction does not become more stringent in 
higher orders of perturbation theory follows from the fact that the 
severity of neither soft nor collinear singularities in QCD increases in 
higher orders of perturbation theory (cf. [17], [25]; this property is 
related to renormalizability of QCD). But even if it did, it would not 
be an obstacle for the theory: one would only have to require that 
observables are smooth (i.e. belong to the class C°°). 

The second step would be to extend Eq.4.2 to more general ob- 
servables than finite sums of energy correlators. This — as is usual 
in situations of this sort — would be accomplished by a limiting pro- 
cedure with respect to / which would commute with the limit 
a s — > 0. To this end, one has to rewrite the mentioned regularity 
conditions in a non-perturbative form. For instance, an analog of 4.5 
could be 



|/(P)-/(P')|<^ / Dist(P,P') forany P,P'e2> 



4.6 



where Dist is defined in 3.27. One sees that if / is an energy corre- 
lator 3.40 then Eq.4.6 implies that the angular function /„ satisfies 
an analog of 4.5. Then one would define the norm 

||/|| = max p |/(P)| + ^ / , 4.7 

and define the space C'(2> ) as the corresponding closure of the sub- 
space spanned by energy correlators satisfying 4.6. This would be 
similar to the standard functional class C l . Recall that functions 
from the class C 1 can be uniformly approximated by polynomials 
together with their first derivatives. Observables from C'(<P) can 
similarly be approximated by finite sums of energy correlators ex- 



m A similar technical assumption — existence of continuous derivatives of 
/ through second order — will be made in the derivation of the key bound 
for jet definition in Sec.6.10. 
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cept that our formulation is in terms of 4.6 instead of derivatives in 

the argument P for purely technical reasons." f(P) »/(Q) for any P £ n and for any C-continuous /. p 4.10 

Finally, one would need inequalities of the form ... ... ... , c .. , 

J n II contains events with an arbitrarily large number of particles 

|JdPA(P) /(P) I < C A || / || 4 8 ' 3U,: me number of particles is always positive and so limited 

' ' from below by a minimum value. Choose any Q from n so that 

for A = x, ^pOCD' » - • m the first case (A =*) an even its number of particles is equal to the minimum value. (This 

need not fix Q uniquely.) Then the condition (ii) implies that 

weaker inequality is expected to be true (with the norm defined (N) n n 

without the second term on the r.h.s. of 4.7). In the second case the ^pQCD ( Q ) is significant, i.e. that Q cannot contain more than 

inequality is essentially equivalent to the proposition that the soft a few particles if perturbation theory works because emission 

and collinear singularities in individual diagrams of pQCD are never of each additional particle is then su pp resse d by an additional 

more severe than logarithmic. In the third case one would also have c 

6 factor a s . 

to verify that C. =0(ai /+1 ) (such a proposition is unlikely not to _, . _ , . , _ . _ 

A ^ That events from II are close to U in the sense of the C- 

be true). convergence (as measured e.g. by the distance 3.27) means that 

All in all, there does not seem to exist any analytical mechanism such events cons i st f a f ew more or i ess narrow energetic 
which might invalidate any of the listed propositions because of the sprays of particles (each spray rough l y corresponding to a par- 
intrinsic analytical naturalness of the described scheme. Although . . , e n > , , e . , , , • , , 

, , . . , , . ,. . s ticle of U) and perhaps some soft background, i.e. randomly 

some technical details (e.g. the description of regularity conditions) ,. , ., ,:, 

. , . , . , , f, , . . J . „ „ directed particles which together carry a small fraction of the 

might need to be made more precise, the basic requirement of C- , r a J 

continuity fits into the general scheme of things so tightly and natu- event s energy. 

rally, from the viewpoints of both physics and mathematics, that it Formal model of hadronization 4 1 1 

seems unlikely that it could require a modification. 

One adopts the following theoretical model for the prob- 

.I^.. m . e .^?. n .^ m ..^. h .!"!?J.^:l-.? . 4 £ ability distribution of events P which is built into any Monte 

Let us now try to understand the structural reasons behind Carlo event generator. 

Eq ' 4 ' 2 ' *(P)« Jdp4 n Q ) CD (p)x// (n) (p,P), 4.12 
The reasoning below will be more transparent if one bears i 

in mind that discussing C-continuous functions defined on 2> is where H w (p> P) is the probability for the parton event p to de- 

rather similar to discussing ordinary continuous real functions velop into the observed event P 

f(x ) defined on the simplex S , the part of the euclidean space m . ,..,,„• • , ■ , . 

i, • The approximate equality in 4.12 is meant to indicate that it 

R described by xt >0, L; xt ^ 1 (the latter restriction is anal o- . ... ,, ,., , , B -. , ., 

, . is not a theorem that the probability 7C(r) can be exactly repre- 

gous to 3.16). The distance 3.27 is similar in general properties . ,. , . „ „„. lU n/inmt 

b , , ,. , ,. . „„ , , , , ,. : sented in such a convolution form. (With 0(100) free parame- 

to the usual euclidean distance in K although the explicit ex- . TT i„) , , , „ „ . 

. . fc , ters in W' the error can be made very small, of course.) 
pression is rather different. However, it is exactly this ditfer- 

ence that masks the dissimilarity of the infinitely dimensional Note the following normalization restriction: 

space of measures on the unit sphere from the ordinary euclid- fdP// <n) (p P) = l 4 13 

ean space R" and thus makes possible the analogy between the J 

events Pe 2> and the vectors xe S . T h e hadronization kernel H {n) must depend on n if the r.h.s. 

The C-continuous observable/ in Eq. 4.2 is in principle ar- of 4. 1 2 as a whole is to represent the exact non-perturbative 

bitrary and so can probe any small region in 2> (smallness can answer. In practice n is fixed and small. q 

be measured using the distance 3.27). Let n be a region of <B . ffigher perturbative terms ^ t0 be added to ,W , 
such that: 

whereas H n ' — which is supposed to express the effect of the 

(i) II is small, i.e. any two events from II differ by slightly , •> 

acollinear fragmentations into/recombinations of, any number entire sum of missin g terms ~ acts on ^pQCD multiplica- 

of particles. Formally, one can say that the distance Dist be- tively. It would be interesting to clarify this point in a system- 

tween any two events from LI is small (i.e. « 1). atic manner. 

(ii) Events from n are produced with a relatively significant Represent the l.h.s. of 4.2 in terms of 4.12: 

probability formally given by I dP^(P)0(P is from IT) . r r , > r , . 

JdP^(P)/(P) = jdp4" ( > CD (p)jdPHW(p,P)/(P). 4.14 

The condition (i) means that for any fixed event Q from IT, 

one would have: This agrees with the r.h.s. of 4.2 if 

JdP ff (n) (p, P)/(P) = fm + 0(af l ) . 4.15 

The approximate equality here is supposed to be valid for any 

° For instance, note the curious fact that elements of the tangent space to 2> C-continuous function /. For this reason, Eq.4.15 can be con- 

at any point P are distributions on the unit sphere. In other words, the tan- veniently represented as follows" 
gent space (a natural habitat of the differentials dP) is different from the 
complete linear space to which it is tangent. One would probably have to 
develop a differential calculus for functions on (P and reformulate 4.6 in 
terms of the space C'(fP) if QCD were non-renormalizable because then 
one might need to require smoothness (the property C") of all the func- 
tions /'(P) involved. 

° This has the same power-counting reasons behind it as the renormaliza- 
bility ofpQCD. 



p At this point we don't discuss how the approximation error depends on /, 
etc. See Sec. 5. 17. 

q Strictly speaking, the convolution 4.12 ought to be performed at the level 
of quantum amplitudes rather than probabilities but we ignore such details 
here. 
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H <n \p,P) = S(p,P) + 0(a n s +l ), 4.16 



where the 5-function on the r.h.s. is defined in the usual way 
with respect to integration over p or P. 

• Eqs.4.15, 4.16 are simply a convenient formulaic expres- 
sion of the verbal statement that observed events generated 
from the partonic event p mostly consist of narrow jets that re- 
semble the parent partons. 

Taking into account the normalization 4.13 and the fact that 
/ is arbitrary (apart from the general restriction of C- 
continuity), one deduces from 4.15 that for P typically gener- 
ated from p according to H (n \ one has 



/(P) = /( P ) + 0(af 1 ). 4.17 



This is another form of the proposition that P is similar to p. 
The meaning of similarity is established by the restriction of C- 
continuity of / that are allowed in 4.17. 

• If one defines a configuration of jets Q to roughly corre- 
spond to the partonic event p then Eq.4.17 implies that Q 
should satisfy the relation /(Q) =/(P). We will see it again in 
Sec. 5.6 

Sensitivity to hadronization and C-continuity 4.18 

We saw in Sec. 2.48 that optimal observables are C- 
continuous as a result of the smearing caused by detector errors 
described by 2.49. C-continuity made observables less sensi- 
tive to such errors. In the present dynamical context, we note 
that hadronization is described by a similar convolution 4.12, 
and for C-continuous observables fluctuations induced by the 
stochastic hadronization are suppressed too. 

• Actually, Eq.4.2 means that C-continuity makes observables 
insensitive (within the precision of perturbative approxima- 
tion) to the hadronization effects which transform the pertur- 
bative TTpQCD mto me hadronic n(P). Remember that the 

mentioned precision of perturbative approximation depends on 
the magnitude of derivatives of the observable. 

Formal construction of optimal observables 4.19 



Suppose we wish to measure a fundamental parameter M 
such as the mass of the W boson. All the dependence on such a 
parameter is localized within n^. Then we can combine 2.17 
and 4.12 and write down a formal expression for the corre- 
sponding optimal observable: 

/opt(P) = 9Mln^ h (P) 

Jdp [3 M 4^ CD (P)W">( P ,P) 

= ~r — ~f — T J • 4 -20 

jdp' [4" Q ) CD (p')]x//W(p',P) 

• The philosophical importance of this expression is that it 
corresponds to the fundamental Rao-Cramer limit on the at- 
tainable precision for the values of M extracted from a given 
data set (recall Sec. 2.7 and the comments after 2.22). There- 
fore Eq.4.20 is an ideal starting point for deliberations about 
any data processing algorithms (including jet algorithms) 
geared towards specific precision measurement applications. 



The key difficulty is that neither the probability distribution 
4.12 nor the formula 4.20 can be evaluated for a given P due 
to the huge number of degrees of freedom in P (in reality, theo- 
retical versions of tt(P) as a whole are materialized only in the 
form of Monte Carlo event generators). This means that a care- 
ful choice of parametrization of events is needed before the 
construction of good approximations to/ op t becomes possible. 

• With a suitable parametrization, Eq.4.20 could be used in a 
brute force fashion: one would map the events into a multi- 
dimensional domain of the chosen parameters (say, q), build a 
multi-dimensional interpolation formula for 7T(P(q)) (via an 
adaptive routine similar to those used e.g. in [11]) for two or 
more values of M near the value of interest, and perform the 
differentiation in M numerically. The resulting multidimen- 
sional interpolation formula would represent the optimal ob- 
servable mapped to q and could be used for the processing of 
experimental events to complement the standard X 1 method 
based on histogramming (recall the comments in Sec. 2.41). 

The difficulty is to find a parametrization that would not in- 
volve a significant loss of information about M. 

Usually employed are parametrizations obtained by de- 
scribing the events P in terms of a few jets, which is made 
possible by the specific structure of 4.20, namely: 

(i) Eq.4.16 means that observed events P are close (in the 
sense of C-continuity as measured e.g. by the distance 3.27) to 
their parent parton events p. 

(ii) The dimensionality of p is small. 

Finding such a p for each observed event P amounts to an ap- 
proximate inversion of hadronization. This will be further dis- 
cussed in Sec. 4.28. Here we would like to take a slightly dif- 
ferent view on the problem. 

If one could restore p from P uniquely, then the optimal ob- 
servable would be identified with its perturbative version: 

f$(P) = 3Mln« n rjc D (P)). 4.21 

However, the perturbative probability density ^ p q C d(P) 
contains singular expressions (generalized functions such as 
the one represented by Eq.4.4) that are not positive-definite, 
beyond the leading order 1 . This means that the perturbative ex- 
pression 7T p qcd(P) cannot be immediately interpreted as a 
probability density. As a result, the derivation of optimal ob- 
servables described in Sec. 2.7 is inapplicable. In other words, 
the expression 4.21 is formal beyond the tree approximation of 
pQCD. 

Nevertheless, it is not impossible to use Eq.4.21 for the 
construction of quasi-optimal observables provided one could 
find a natural way to extend it (or some its simplified version) 
to all events P by C-continuity. (Remember that the formal 
nature of p and P is the same.) Such an extension can some- 
times be accomplished in such a way that the problem of re- 
storing p from P does not occur. Here is an example. 

Constructing observables via extension by 

C-continuity. Precision measurements of a s 4.22 

Consider measurements of the strong coupling Us in the 
process e + e — > hadrons. We are going to show how the con- 
cept of optimal observables could have been employed to ob- 
tain shape observables that best suit this purpose. 



'If f(x)>0 and <p(x) =<p + <P\X + <p 2 x 2 + . . . then <p > (if it is non- 
zero) but the sign of (p i etc. may be arbitrary. 
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At a very crude level of reasoning, the probability density 
can be represented as a direct sum 

^pQCd(P) = {^ 2 (P)} 2 ©K^ 3 (P)} 3 ® {«^4(P)} 4 © ••• . 4.23 

where each term corresponds to the n -particle sector of the 
space of p. Each n n also contains O(ccs) corrections. 

Use the prescription 4.21 with M — »G!s, multiply the r.h.s. 
by as (because / pt is defined up to a constant), and drop 
higher-order terms in each sector, which is not prohibited by 
the prescriptions of Sec. 2.25. Then one obtains: 

/ pragm (P)~{o} 2 e{i} 3 e{2} 4 e... 4.24 

In other words, a minimal requirement is that the observ- 
ables should vanish on 2-particle events. This is exactly the re- 
quirement which was used in [3], [4] to derive the so-called 
jet-number discriminators J,„ [P] by assuming the simplest 
analytical form for/ quas i (a 3-particle correlator). The simplest 
C-continuous expression corresponding to the above require- 
ment then is 



/ q uasi(P) = J 3 [P]. 



4.25 



(See [3], [4] for exact expressions.) 

Remember, however, that there is an arbitrariness in the 
construction of J,„[P] in [3], [4]: the factors A i; - involved in the 
construction are only required to behave as 0(9fj ) with a 

positive c as »0 (0,-y is the angle between i'-th and y'-th 
particles of the event). The simplest analytical behavior corre- 
sponds to c = 1 whereas the simplest covariant expressions cor- 
respond to c=2. 

• It might be possible to fix this arbitrariness, as follows. 
The perturbative expression for 713 is singular and not strictly 
non-negative exactly in situations corresponding to >0. 
To rectify this one could perform a resummation of perturba- 
tion series thus introducing a non-trivial dependence on as . 
Then the differentiation in 2. 1 7 would replace the 1 , 2, ... in 
4.24 with something more interesting (i.e. dependent on as) in 
the region i; — > 0. Then by examining how such a dependence 
affects the result of differentiation in a s in the definition of 
/opt, one might be able to modify the observable 4.25 accord- 
ingly. This interesting theoretical problem seems to require a 
kind of pQCD expertise similar to that behind the fcx-algorithm 
[32]. 

The increasing integer weight in each sector in 4.24 corre- 
sponds to the simple fact that higher powers of as are in- 
creasingly more sensitive to its variations. So the expression 
4.24 suggests to replace 4.25 with a sum similar to the follow- 
ing one: s 



/cuasi(P) = J3[P] + 2J 4 [P] + 3J 5 [P]+. 



4.26 



Of course, the series cannot contain more terms than the num- 
ber of theoretically known corrections to ^ p q CD . 

Actually, any conventional shape observable that vanishes 
(only) on 2-particle configurations meets the above require- 
ment. For instance, one such shape observable is the combina- 
tion l-T, where T is the so-called thrust (eq.(46) in [1] and 



s Recall that jet-number discriminators are normalized so that their maxi- 
mal value reached on configurations with no less than m widely separated 
particles, is equal to 1 . 



refs. therein). In our notations (we assume that the event's total 
energy is normalized to 1), the explicit expression is 



l-T(P)=l-max[X; a £ a |cose a |] 
= min [^ a ' E «( 1 -| COse «l)]' 



4.27 



where B a is the angle between the a-th particle's direction and 
an axis (the thrust axis). The optimizations are performed with 
respect to the directions of the thrust axis that determines ori- 
entation of the angular function as a whole but does not affect 
the magnitude of derivatives, which ensures C-continuity of 1 - 
UP). 

In terms of the classification of Sec. 3.35, the definition 4.27 
belongs to the class of generalized shape observables because 
it involves an optimization procedure on top of a basic shape 
observable. 

The above reasoning can be regarded as an argument for 
quasi-optimality of the observables such as thrust and jet- 
number discriminators for precision measurements of a$. We 
will have to say more on this in Sec. 4.68. 



Constructing observables via jet algorithms. 
The conventional approach 



4.28 



Let us explore how one could construct quasi-optimal 
observables that would approximate 4.20 using the fact that the 
majority of hadronic events P resemble their partonic parents 
p, as formally expressed by 4.16 (4.15). Although it is impos- 
sible to exactly restore the parton parent p for each 
observed event P (see after 4.32, Sec. 4.47 and Sec. 5. 10), the 
idea is a useful heuristic to start from. 

The conventional approach to construction of observables 
involves three elements: a jet algorithm, an event selection 
procedure which we call the jet-number cut, and a function on 
jet configurations. 

We will focus only on the general structure and properties 
of the conventional data processing scheme based on jet algo- 
rithms, and the specific form of the jet algorithm will play no 
role in the following discussion. 



General structure of jet algorithms 



4.29 



Assume there is a so-called jet algorithm that somehow ac- 
complishes an approximate inversion of hadronization. For- 
mally, such algorithm is a mapping of arbitrary events P into 
similar (pseudo) events Q: 



jet algorithm 



^Q = Q[P]. 



4.30 



Q usually has many fewer (pseudo) particles than P. 

Recall that partonic events p have the same formal nature 
as the hadronic events P. This implies, first, that Q is an object 
of the same nature as P and p; second, that the mapping 4.30 
is defined on both hadronic and partonic events. 

We will call Q jet configurations and their pseudoparticles, 
jets . For clarity's sake, we distinguish jets the mathematical 
objects (the pseudoparticles of Q) from jets the collections of 
particles (hadrons or partons) in which case we will use the 
terms spray or cluster , usually in informal reasoning. 

Jets in Q will be labeled by the index j, and the j'-th jet is 
characterized similarly to particles of the event P (cf. 3.6), i.e. 
by its energy and direction denoted as and : 
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Q={M,} 



/=!.. .Af(Q) 



4-31 e(Q[P]has K jets) = e(Pel> K ). 



In practice both particles and jets are endowed with additional 
attributes, e.g. Lorentz 4-momenta, and the jet algorithm 
evaluates them along the way, but at this point we ignore such 
complications. 

That the mapping 4.30 is supposed to be an approximate in- 
version of the hadronization described by the kernels H M in 
Eq.4.12, means that Q should be close to p. This can be repre- 
sented as 



Q«p. 



4.32 



The exact meaning of the approximate equality is yet to be 
specified, and it may be impossible to identify a single partonic 
configuration which hadronized into P. Still, there is a class of 
events for which the relation 4.32 is unambiguous (at least in 
the asymptotic limit of high energies), and this provides a 
minimal requirement which any jet algorithm must satisfy and 
which serves as a sort of boundary condition for jet definition 
which we present for explicitness' sake: 

For events which consist of a few energetic well isolated narrow 
sprays of particles, each spray is associated with a jet whose energy 
and direction coincide with those of the spray. 

4.33 

The ambiguity of jet definition concerns how jet algorithms 
handle fuzzy events that do not fall into the above category. 

Another important condition usually imposed on jet algo- 
rithms is that the mapping 4.30 should be fragmentation in- 
variant. In the context of our theory this is essentially superflu- 
ous since the interpretation of events and functions on them 
modulo C-continuity (which incorporates fragmentation invari- 
ance; see 3.34) is built into our formalism at a linguistic level: 
If all the arguments are expressed in the language of 3.12 
rather than the particle representation 3.6 then the resulting jet 
definition will be automatically fragmentation invariant. 

Note that any reasonable jet algorithm sets, explicitly or 
implicitly, a lower limit on the angular distances between jets 
in Q. The limit may depend on jets' energies. 

A related observation is that the mapping 4.30 cannot 
(unless it is trivial, i.e. Q [P] = P) be continuous in any non- 
pathological sense for some P. The points of discontinuity usu- 
ally correspond to the events whose different small deforma- 
tions result in jet configurations with different numbers of jets. 

The jet-number cut 4.34 

Another element of the conventional data processing 
scheme is the so-called jet-number cut , which is a selection 
procedure (similar to any other event selection procedure; see 
Sec. 2.35) based on the number of jets the chosen jet algorithm 
finds. 

It is convenient to introduce a notation for the collection of 
events with a given number of jets (the K-jet sector ): 



<£ K =\Pe<£\ Q[P] has K jets} . 



4.35 



Then the space of events <£ is sliced into a sum of <S K for dif- 
ferent K. The exact shapes of <S K depend on the chosen jet al- 
gorithm. 

The jet-number cut is equivalent to inclusion into observ- 
ables of a dichotomic factor of the form 



4.36 



(The 0-function is defined in 2.39.) 

The value of K is chosen to enhance sensitivity to M and to 
suppress backgrounds. It is usually determined using the ap- 
proximate relation jets = partons (Eq.4.32). Then K is the 
number of partons in the final states in the lowest order of the 
QCD perturbation theory in which the dependence on the pa- 
rameters one is interested in is manifest. 



Observables 



4.37 



The last element of the conventional approach is a function 
defined on jet configurations Q which passed the jet-number 
cut (Sec. 4.34); denote it <p a d hoc(Q) • 

In practice (p(Q) is chosen in ad hoc fashion although once 
the jet algorithm is chosen then it is possible in principle to 
construct optimal observables for the probability distribution 
mapped to Q (Sec. 4.52). 

The observable on events P is then defined as follows: 



->/(P>, 



j. number cut 

where /(P) = 0(Q[P] has Ajets) <p(Q[P]) . 
The data processing scheme 2.5 becomes 



4.38 



x(P)— 

j.a. +cut 
1 "' j.a. + cut 1 ' 



*7C*(Q)- 



-*(/} 

w / exp 



->a s ,M w ,.. 



4.39 



It is quite obvious that the optimal observable 4.20 cannot 
be represented in the form 4.38 with any non-trivial jet algo- 
rithm in realistic situations. This means that with such observ- 
ables it is impossible to achieve the theoretical Rao-Cramer 
limit on the precision of determination of fundamental pa- 
rameters. We will come back to this in Sec. 4.43. 



Examples 



4.40 



Two typical examples are as follows. 

The first example is the so-called 3-jet fraction in the proc- 
ess e + e~ — > hadrons which used to be one of the observables 
employed for measurements of as at LEP1. Here one simply 
has: 



/ 3jets (P) = 0(Qhas3jets). 



4.41 



The second example is a simplified (but sufficient for the 
purposes of illustration) version of what might be used at LEP2 
to measure the mass of W in the process e + e~ — » W 
—> hadrons above the W* threshold where each W decays 
into two jets. Here one would select events with 4 jets and 
choose (p(Q) to yield an array of numbers, each being the 
number of jet pairs from Q whose invariant mass falls into the 
corresponding interval of the mass axis (bin): 

/dije ts (P)=e(Qhas4jets) 

xfno. of dijets from Q in the m-th bin] , „ . 4.42 



Understanding the observables 4.38 



4.43 



Substitute /(P) defined by 4.38 into the l.h.s. of 4.2 and use 
4.12. Simple formal changes of the order of integrations yield: 



fdP«(P)/(P)= f dqrc*(q)<Kq) 

J Jq has ST jets 



4.44 
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where 

^(q) = Jd P 4 n Q ) CD (p) h M (p,q), 

with the kernel h M given by 

h (n) (p, q) = JdP H M (p, P) «5(q,Q[P]) . 



4.45 



4.46 



The 5-function on the r.h.s. is similar to the one in 4.16. 

If one drops the jet-number cut from the definition 4.38 then 
the only change to be made is to drop the restriction on q in the 
integral on the r.h.s. of 4.44. 

Note that Eq.4.45 differs from 4.12 by the replacements 
H M ^h M , P->q. 

If the mapping P — » Q corresponds to a typical jet algorithm 
then the domain of q is, generally speaking, the same as for P, 
i.e. <S (Sec. 3.15). However, most of the probability density 
7T*(q) is now concentrated on pseudoevents q with fewer parti- 
cles than was the case with 7t(P). Second, 7T*(q) is zero on jet 
configurations with some pairs of jets sufficiently close (the 
corresponding events P are then mapped to jet configurations 
with a single jet instead of such a pair; recall the comments 
after 4.33). 

The kernel hf~"\p, q) is interpreted as the probability for the 
partonic event p to generate any hadronic event that would 
yield the jet configuration q after application of the jet algo- 
rithm. 



On inversion of hadronization 



4.47 



Eq.4.45 means that the kernel h M (p,C\) effects a smearing 
of the perturbative expression. If the complete 7l(P) given by 
4.12 is strictly non-negative then such must also be 7T*(q) . 

The latter fact has the following consequence: 



Since the pQCD probability density 7T p qcd(P) is not strictly 

non-negative near some p, the non-negativity of its smeared analog 
7r*(q) implies that an exact inversion of hadronization is impossible 
with any jet algorithm in the form of the mapping 4.30. 

4.48 



This impossibility can be quantified; see Sec. 5. 10. 
Furthermore, the hadronization kernel H M depends on n, 
the order of pQCD corrections included into the perturbative 

probabilities 7TpQ CD (P) in 4.12. It is not clear which n the in- 
version of hadronization should be geared to. 

For instance, consider radiation of a gluon by a quark. If n 
corresponds to the leading order (LO) approximation then the 
mechanism of gluon radiation is described by the hadronization 
kernel H { "\ If n corresponds to the next-to-leading order 
(NLO) then 7r p Q CD (p) is a sum of LO and NLO terms, and 

then H {n) should contain contributions which dress the LO and 
NLO terms. This in fact is a different aspect of the same prob- 
lem 4.48: Jets have to be defined at the level of perturbative 
quarks and gluons before a connection with observed data can 
be established. 

Still another aspect of the same problem is in terms of non- 
uniqueness of inversion of hadronization. In general, different 
configurations of partons may result in the same hadronic 
event. This is seen e.g. from the collective nature of hadroni- 
zation (a single colored parton cannot develop into a jet of col- 
orless hadrons). It is even more true if partonic cross sections 
are evaluated in NLO approximation where a quark can radiate 



an almost collinear gluon, etc. Then for some events one must 
rely on a convention about whether such an event is a had- 
ronized LO quark, or a hadronized NLO configuration of the 
same quark and a gluon. We will come back to this point in 
Sec. 4.70. 

Lastly, from a computational viewpoint, inversion of a con- 
volution like 4.12 is in general an ill-posed problem. This 
means that even if a solution formally exists, numerical insta- 
bilities may be encountered in practice. In the present case, 
such instabilities occur near the discontinuity of the mapping 
P — > Q, as already discussed. 



Understanding h M (p,q) 



4.49 



The importance of the kernel h { "\p,<\) is due to the fact that 
it characterizes the combined effect of the chosen jet algorithm 
and the hadronization mechanism represented by H M . 

h (p,C\) may be non-zero even if the numbers of particles 
in p and q do not coincide (two close partons from p may had- 
ronize into overlapping sprays of hadrons which the jet algo- 
rithm maps into a single jet). This motivates introduction of the 
following quantities. Define 



h M (p,K) = f dqA w (p,q). 

Jq has jets 



4.50 



This is interpreted as the probability for the partonic event p to 
hadronize into hadronic events recognized by the jet algorithm 
as having K jets. Then the fraction of L-parton events which 
hadronized into K -jet events is formally given by 



h(L,K) 



f dpjr(& D (p)A«(p,A0 

JD has /. nartnns 



_ Jp has L partons 



dp»<& B (p)/dqA«(p,q) 



4.51 



Jp has L partons 



(The integral over q in the denominator yields 1 as is seen 
from the definition 4.46 and the normalization 4.13.) 

The quantity h (K, K) is the fraction of events P generated 
from partonic events with K partons and recognized by the al- 
gorithm as having tfjets. The quantities h (n) (p,q), h (n \p,K) 
and h(L, K) give a more differential information. They can in 
principle be studied numerically using Monte Carlo event gen- 
erators. In particular, it is interesting to compare the spread of 
q around p for a few typical p and for different jet algorithms. 

It might be useful (certainly interesting) to have a reasona- 
bly detailed empirical information of the kernels h { "\p, q), 
h in \p,K), etc. 



Optimal observables in the class 4.38 



4.52 



From a general mathematical viewpoint the smearing 4.45 
can be regarded as an example of a regularization of a singular 
approximation (Sec. 2.51; i.e. the pQCD approximation 
7TpQ CD (p) of the exact probability density n(P)), transforming 

it into a physically meaningful form. This implies that whereas 
the perturbative expression 4.21 is formal, it is entirely mean- 
ingful to construct an optimal observable defined on q from 
7T*(q) according to the standard recipe 2.17: 



4.53 



In terms of events P, the observable 4.53 is 



F.V.Tkachov hep-ph/9901444 [2 nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 22 of 45 



/ opt (P) = «p„ pt (Q[P]) 

_ Jdp[3 M ff<^ CD (p)]xfcC)(p,Q[P]) 
" Jdp'[^j] CD (p')]xfc(-)(p',Q[P]) ' 



4.54 



The kernel h (n) is given by 4.46. 

If the dimensionality of q for which 4.54 is non-negligible is 
not too high then a brute force construction of a numerical in- 
terpolation formula to represent 4.54 might be feasible. 

The formula 4.54 is valid for any jet algorithm (cone, kj, 
etc.), and it describes a way to achieve the theoretically best 
precision for the parameter M with a given jet algorithm 
within the conventional scheme 4.38. Of course, the functions 
defined by 4.54 differ for different jet algorithms. 

It is easy to take into account the jet-number cut. Then all 
one has to do is restrict considerations to the -S'-jet sector: 

7i* (q) -> 74 (q) = Z K l 7i* (q) 0( q has tfjets) , 4.55 

where Z K is an appropriate normalization factor (it may de- 
pend on M). Then Eq.4.53 is modified as follows: 

<?W(q) = 3 M In k\ (q) x 0( q has Kjeta) 



Note that in practical constructions the subtracted term in 
square brackets may be dropped (the comment after 2.18). 
The corresponding observable defined on events P is 

/opt,*(P) = <?W(Q[P]) 

= [/opt(P) - 3 m lnZ J x e ( Q[P1 has ^j ets ) • 4.57 

This construction remains valid for any K, i.e. one can con- 
struct an optimal observable in any jet-number sector. 
Of course, usually one sector (which corresponds to the 
"canonical" value of K; see the remarks after 4.36) would yield 
a more informative observable than others. 



Inclusion of adjacent jet-number sectors 



4.58 



There is nothing to prevent inclusion into consideration in 
4.55-4.57 of additional jet-number sectors. Quite obviously, 
this would increase informativeness of the resulting aggregate 
observable. If the K-jet sector was the most informative one 
then it is natural first to include one or both adjacent sectors 
which correspond to (K ± 1) jets. 



Comparison of different classes 
of observables 



4.59 



In the following discussion we assume that a jet algorithm 
is fixed (unless indicated otherwise). 

We can compare different kinds of observables for meas- 
urements of a fundamental parameter M: 

( 1 ) A conventional observable of the form 

/adhoc,*(P) = 0(Q[P] has K jets) x (p adhoc>x (Q[P]) , 4.60 

which involves a jet-number cut and an ad hoc function 
?>adhoc,K(q) usually defined on jet configurations with K jets 
only, as described by the ©-function in 4.60. 



(2) The observable f ovUK given by 4.57, which yields the 

best precision among observables of the form 4.60, i.e. defined 
via intermediacy of the chosen jet algorithm in the -S'-jet sector. 

(3) The observable f opti K,K±i defined by inclusion of the 

adjacent jet-number sectors (Sec. 4.58; one could include only 
one of the two adjacent sectors.) 

(4) The ideal observable / opt 4.54 which yields the best pre- 
cision among observables defined via intermediacy of a jet al- 
gorithm in all jet-number sectors. 

(5) The ideal observable / opt (4.20) defined without jet algo- 
rithms. It yields the absolutely best (Rao-Cramer) precision for 
the parameter M. 

The observables are listed in increasing informativeness: 
Quite obviously, each additional restriction on the form of ob- 
servables is an extra obstacle for achieving the Rao-Cramer 
limit of precision. 

Furthermore, it is clear that one can, at least in principle, 
construct quasi-optimal observables (Sec. 2.25) for any of the 

observables f opUK and f mKiK±l . 

The following figure illustrates the relation between the 
various observables which we discuss: 



[<p opt (q)-a M lnZjxe(qhasi«:jets). 4.56 



the Rao - Cramer limit 



/opt 

t 

fopt,K,K±l 



/opt, K 



fc 



quasi, K, K±l 



f reg 

^% /* 7 quasi, * 



ad hoc, £,£±1 



ad hoc, AT 




f reg 

J'c 



ad hoc, AT 



4.61 



Hats denote observables defined via intermediacy of a jet 
algorithm (remember that the reasoning in this section is valid 
for any fixed jet algorithm). Arrows indicate an increase in in- 
formativeness (neither absolute nor relative magnitudes of the 
increase can be predicted a priori). The absence of arrows be- 
tween two observables means their informativeness cannot be 
compared a priori (except for the case of / opt which has the ab- 
solutely highest informativeness). 

The "reg" arrows correspond to the option of regularization 
of cuts which will be discussed separately (Sec. 4.68 and 
Section 9). 



Ways to increase informativeness 
of ad hoc observables 



4.62 



Consider a conventional ad hoc observable / a dhoc,x 

(Eq.4.60). There are at least the following ways to improve it 
(cf. Fig. 4.61): 

(i) Replacing the ad hoc observable with a quasi-optimal one 
(the prescriptions of Sec. 2.25). 

(ii) Inclusion of the adjacent (if ± l)-jet sectors (Sec. 4.63). 
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(iii) Regularization of discontinuities (Sections 4.68 and 9). 

(iv) Adjustment of the underlying jet algorithm (to be dis- 
cussed in Sec. 5.3). 

These options may be combined. In subsequent subsections 
we will discuss them in more detail together with some related 
issues. 



Inclusion of additional jet-number sectors 



4.63 



If, say, the if -jet sector was the most informative one 
(usually because the lowest PT order where the dependence on 
the desired parameter first manifests itself corresponds to final 
states with K partons) then it is natural to first include one or 
more of the adjacent, (K± l)-jet sectors. 

The best precision is obtained if one uses a quasi-optimal 
observable in each sector, otherwise an increase of precision is 
not guaranteed. 

The simplest way to include information from additional 
jet-number sectors is to map events from each additional sector 
into one point (the scheme A.AA-AA6, 4.53 is valid irrespective 
of the physical or mathematical nature of the mapping P — > Q). 
Then it is sufficient to determine the value of the correspond- 
ing optimal observable at that point (this means that all events 
from this sector receive the same weight). The magnitude of 
the resulting increase of informativeness could be regarded as a 
signal of whether or not a more detailed treatment might be 
warranted. Such a procedure might be a useful way to control 
the loss of information due to the restriction of the jet-number 
cut. 

Inclusion of additional jet-number sectors seems to become 
useful whenever the quantity l-h(K,K) (Eq.4.51 and the 
comments thereafter) is appreciably non-zero. The difficulty 
here would be if adding even one jet increases the dimension- 
ality of phase space too much to allow a meaningful construc- 
tion of observables following 4.53. Then one may be satisfied 
with defining reasonable ad hoc observables in the adjacent 
jet-number sectors. In such a situation one may find inspiration 
e.g. in the constructions of [4] such as spectral discriminators. 
Computation of spectral discriminators may be prohibitively 
expensive for raw hadronic events but some similar observ- 
ables defined on jet configurations with, say, no more than 10 
jets should not be difficult to compute. 

For instance, in the context of example 4.42, one could in- 
clude into consideration the 5-jet sector and define a similar 
observable by allowing both di- and tri-jets (an additional jet 
may have been radiated from one of the partons originally 
forming a dijet). And/or one could include the 3-jet sector and 
define an observable in it based on the fact that some pairs of 
partons may generate overlapping jets which may be seen by 
the jet algorithm as a single jet (e.g. the invariant mass distri- 
bution of single jets). 



Sources of non-optimality of the observable 4.54 



4.64 



With a fixed jet algorithm, a conventional ad hoc observable 
4.60 can always be improved, in principle, by a transition to 
the optimal observable 4.57, and by inclusion of all jet-number 
sectors (the combined effect of both tricks is represented by 
4.54). So the truly fundamental limitations of the conventional 
scheme are those associated with the sources of non-optimality 
(i.e. loss of precision of the extracted values for M) of the ob- 
servable 4.54 compared with the ideal expression 4.20. 

There are two sources of such non-optimality in the jets- 
mediated optimal observable 4.54 compared with 4.20: 



1) The variation of the expression 4.20 over the collections of 
events P which correspond to the same jet configuration q . 
Each such collection is described by the equation q = Q[P]. 

2) The discontinuities of 4.38 at the boundary of the regions 
<S K (defined in 4.35). 

Concerning the first source, the guidance here is provided 
by the general criteria 2.26, 2.27 with / quasi = / opt . One can 
make the following simple observation: 



The faster the variation of / opt near some P, the more fine- 
grained should be the mapping P — > Q there. 



4.65 



Non-optimality due to discontinuities 



4.66 



The second source of non-optimality is due to disconti- 
nuities at the boundaries of <S K ■ Fig. 4.67 gives an illustration 
of what happens near such a boundary. 

/opt 

/opt 1(1 h0C ' 



I ^ 



4.67 



The left figure shows / opt against / opt . It is assumed that 
the latter is small outside the K -jet sector. The shaded areas 
corresponds to the non-optimality of / opt (recall the criterion 

2.27 and the rule of thumb 2.29). / opti K differs from / opt by 

being equal to zero outside <S K (apart from an inessential ad- 
ditive constant). The right figure shows an ad hoc variable 
against / op , . If the variable where the if -jet fraction, it would 
be constant in <P K ■ 

The problem is exacerbated if the boundary of <S K passes 
through the region of a fast variation of/ opt . Note e.g. that the 
probability density (from which f opt is constructed) in QCD 
varies by an order of magnitude between the regions corre- 
sponding to K and K+ 1 jets because radiation of an additional 
jet is accompanied by the factor as -0.1. 

It is clear from Fig. 4.67 that forcing the observable to con- 
tinuously interpolate between its different branches 
(represented by fat lines) would eat away at the non-optimality 
(the shaded area) and thus increase precision of determination 
of the parameter M . 

The relevant notion of continuity (among the many possible 
ones in an infinitely-dimensional space of events 2>) is the C- 
continuity discussed at length in Sec. 3.18. (However, remem- 
ber 2.50.) 

Next we consider an example which shows that elimination 
of the discontinuities that are typical of the conventional ob- 
servables may result in a noticeable improvement of precision 
of measurements. 
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The role of continuity (shape observables vs. 

3-jet fraction in measurements of a s ) 4.68 

The effect of non-optimality due to the jet-number cut is 
seen in the precision measurements of a s at LEP. Here one 
usually employs observables / such that 

(f) = 0(a s ). 4.69 

One example is the shape observable thrust defined by 4.27. 
Another example of an observable which satisfies 4.69 is the 3- 
jet fraction 4.41. Since the latter is a discontinuous observable 
of the conventional kind (4.60) whereas the former is continu- 
ous (even C-continuous; Sec. 3.18) and smoothly interpolates 
between different jet-number sectors, it is interesting to com- 
pare them in regard of the quality of results they yield. 

We have seen (Sec. 4.22) that shape observables such as the 
thrust are nearly optimal for measurements of as (which is 
quite obvious already from 4.69). On the other hand, the 
boundary between the 3-jet and 2-jet regions is located where 
the probability density and / opt vary fast: as was pointed out in 
[3], 1% of 2-jet events incorrectly interpreted due to detector 
errors and statistical fluctuations in hadronization as having 3 
jets induces a 10% error in the 3-jet fraction because the corre- 
sponding probabilities differ by a factor of O(as). 

Note that l-T(P) smoothly interpolates between the points 
in phase space where it takes its minimal and maximal values 
(0 and 1). This should be contrasted with the discontinuities of 
the 3-jet fraction 4.41. 

Another kinematic property of the shape observables such 
as the thrust is that they are rather simple energy correlators 
and thus fit into the structure of quantum field theory. This 
property ensures their superb amenability to theoretical inves- 
tigations such as the sophisticated higher order calculations for 
the thrust reviewed e.g. in [20]. 

By now it has been accepted that the as measurements (at 
least of the LEP type) are done best via shape observables 
rather than the 3-jet fraction.' 

The boundaries of <£k and non-uniqueness of 

inversion of hadronization 4.70 

We conclude that the discontinuities at the boundaries of 
different jet-number sectors in the space of observed events 
may be a major source of non-optimality of conventional ob- 
servables. The events near the discontinuities have two or more 
jets that are hard to resolve reliably. 

This has a simple physical interpretation in terms of non- 
uniqueness of inversion of hadronization: There is no way to 
tell whether an event with overlapping jets was generated by K 
hard partons dressed by a hadronizing QCD radiation, or by 
K+ 1 hard partons with two of them close enough to make the 
resulting jets overlap. 

So, in general, there may be more than one candidate par- 
tonic events that can be regarded as parents for a given had- 
ronic event. The best one can do is provide weights for each 
such candidate; the weight reflects the expected probability for 
the hadronic event to have been generated from a particular 
partonic candidate. 

In Section 9 we will discuss ways to assign to the same 
event several different jet configurations with suitably chosen 



' A large table presented in the lecture [26] did not contain a line for the 3- 
jet fraction which used to be a standard feature of such tables. In response 
to a query, the speaker mentioned unsatisfactory experimental errors. 



weights, together with prescriptions for regularization of the 
jet-number discontinuities. 

Definitions of jets 5 

The use of jet algorithms in data processing following the 
scheme 4.39 is motivated by the specifics of QCD dynamics. 
The arguments of Sections 2^1 provide a framework to discuss 
jet definitions. A jet algorithm is a tool for construction of ob- 
servables for specific precision measurement applications 
(Sec. 4.28), and the resulting observables can be compared us- 
ing the notion of informativeness of observables (2.24). The 
usefulness of jet algorithms is due to the dynamics of pQCD 
(Sec. 4.1) which allows one to regard hadronic events as simi- 
lar to their hard partonic parents (Eq.4.16). This justifies the 
point of view that jet algorithms effect an (approximate) inver- 
sion of hadronization (Eq.4.32). 

First in Sec. 5.1 we discuss the conventional criterion used 
to compare jet algorithms. Then in Sec.5.6 we introduce a jet 
definition based on the informational abstractions of Section 3 
(the identification 3.43). The explicit purpose of such a jet 
definition is to serve as a tool for a systematic construction of 
quasi-optimal observables defined on hadronic states. In 
Sec. 5. 10 we examine how the dynamical considerations com- 
plement the picture. 

The conventional approach to jet algorithms 5.1 



A common way to judge suitability of a jet algorithm to a 
particular application (a precision measurement of a funda- 
mental parameterM; Sec.2.1) is as follows. One chooses K as 
described after Eq.4.36 and evaluates the fraction of events 
generated from partonic events with K partons and recognized 
by the jet algorithm as having Ajets. This fraction is formally 
given by h(K,K) defined by Eq.4.51. (Note that 
0< h(K,K)< 1.) The larger this fraction, the better the jet algo- 
rithm is deemed to be. 



This criterion amounts to an implicit definition of an ideal jet algo- 
rithm as the one which maximizes h{K,K). The various jet algo- 
rithms are then regarded as candidate approximations constructed 
empirically. 

5.2 

This definition can be related to the notion of optimal ob- 
servables as follows. Consider the example 4.42. Then at the 
level of partons, the optimal observable is entirely localized on 
4-parton events. At the level of hadrons, the optimal observ- 
able/opt is mostly localized in the 4-jet sector 2V If it were a 
constant there then it is entirely specified by 2>4, and the con- 
ventional criterion simply attempts to find the shape of I4. 
Note that there can be many parameters one may wish to 
measure and so many different / opl . Since their shapes are all 
different, focusing only on the shape of 2>4 is a convenient 
compromise. 

The advantage of the conventional criterion 5.2 is its sim- 
plicity and naturalness. 

The disadvantages are as follows: 

(i) Beyond the leading PT order, the signal is non-zero in 
other jet-number sectors. 

(ii) /opt is not piecewise constant. 

(iii) The criterion is based on a convention which although 
plausible is not based on a precise argument (see however the 
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reasoning in Sec. 5.3). 

(iv) There is no clear way to improve upon the conventional 
scheme 4.38 should one find the leading order PT arguments 
insufficient. 

The concept of optimal observables allows one to be a little 
more precise: 

Improving upon the conventional jet definition 5.3 

Within the limitations of the conventional scheme 4.38, the 
best precision for M is achieved with the observable f opUK 

(Eq.4.57) which is entirely determined once the jet algorithm 
is fixed. So it is legitimate to ask which jet algorithm maxi- 
mizes the informativeness of / optjX . The definition is mean- 
ingful because the informativeness of f ovUK is given by the 
following integral: 

Jdq 7i* K (q) x e{ q has K jets )xp^(q). 5.4 

The best jet algorithm would then maximize this. 

It is interesting to find a way to connect this with the con- 
ventional criterion 5.2. Suppose one aims at a universal jet 
definition, then it is natural to replace the last factor by a con- 
stant. Then recall Eqs.4.55 and 4.45. In the latter, restrict the 
integration to if-parton events. The resulting integral coin- 
cides, up to a normalization, with the numerator of h(K,K); cf. 
Eq.4.51. 

Unfortunately, it is not clear how to derive from this a spe- 
cific jet algorithm. 

Furthermore, the conventional framework 4.38 per se im- 
poses a restriction on attainable precision for fundamental pa- 
rameters. If one seeks to alleviate it by, say, an inclusion of the 
adjacent jet-number sectors then the best jet algorithm should 

minimize the informativeness of f opUKtK ±i rather than / opt K . 

Furthermore: 



It is not clear whether or not imperfections of the conventional jet 
algorithms are more important than the intrinsic limitations of the 
conventional scheme 4.38 as a whole. 

5.5 



The answer probably depends on the problem. A priori one 
cannot exclude that for some applications, an improvement of 
the scheme 4.38 as a whole via relaxation of the jet-number cut 
in the spirit of regularizations of Sec. 2.52 could be more im- 
portant than improvements of the jet algorithm only. 
A relaxation of the jet-number cut implies an inclusion into 
consideration of events with at least the adjacent numbers of 
jets, if ± 1. 

The example of Sec. 4.68 lends credibility to this point of 
view: the transition from 3-jet fractions to shape observables in 
the measurements of as can be regarded as a trick to take into 
account events from all jet-number sectors. This example is 
special in that jet algorithms can be avoided altogether in the 
improved observables. In general such luck may not occur, so 
jet algorithms are bound to remain a part of the answer. 

But if one includes into consideration events with "wrong" 
numbers of jets and/or finds a way to regularize the disconti- 
nuities at the boundaries between different jet-number sectors 
in order to make the resulting observable continuous at those 
boundaries, then the details of how the space of events is sliced 
into jet-number sectors may become less important. 



We conclude that a desirable property of a good jet algo- 
rithm is to provide options for a systematic improvement upon 
the conventional scheme 4.38. The jet algorithm we derive 
below offers such options. 

The optimal jet definition. Qualitative aspects 5.6 



The jet definition we are going to introduce deserves to be 
called optimal for two reasons: 

The kinematical reason: It involves an optimization 
that has a well-defined meaning in terms of information 
content of events and the corresponding jet configurations 
(Sec. 5.7). 

The dynamical reason: It possesses a property natu- 
rally interpreted as an optimal inversion of hadronization 
(Sec. 5.10). 

• The two properties are logically independent (at least I 
don't see a formal connection), and both lead to exactly the 
same definition 5.9. The only common element is the formal 
language in which both are phrased (the language of general- 
ized [C-continuous] shape observables, Sec. 3). 

The equivalence of the two approaches came about as a 
complete surprise. The order of presentation is determined by 
historical reasons. 

Informational definition 5.7 

From the most general point of view, the jet algorithm, op- 
erationally, is a data processing tool whose purpose is to fa- 
cilitate extraction of physical information. The resulting sim- 
plifications come at a price — a loss of information in the tran- 
sition from events to jet configurations. The most basic and 
general requirement for any data processing tool — jet algo- 
rithms not excluded — is that the distortions it induces in the 
physical information should be minimized. 

So it is natural to require that the best algorithm should 
minimize such an information loss: 



The jet configuration Q[P] must inherit maximum information 
from the original event P. 

5.8 



This in fact is similar to the conventional criterion 5.2 but 
now we would like to be more systematic in regard of inter- 
pretation of the information loss. To this end we will rely on 
the kinematical analysis of Sec. 3. 

Note that the criterion 5.8 is applicable both to experimen- 
tally observed hadronic events and to theoretical multiparton 
events in situations where radiative QCD corrections need to 
be taken into account. 

The analysis performed in Section 3 led us to the identifi- 
cation 3.43. This immediately allows us to translate the crite- 
rion 5.8 into the following form: 



/(P)~/(Q[P]) for any basic shape observable /. 5.9 



The less the discrepancy between the left and right hand sides, 
the more information from P is inherited by the jet configura- 
tion Q[P]. 

The definition requires comments. 

(i) The exact equality can always be achieved in 5.9 for 
Q[P] = P, so for the replacement P — > Q to make sense, one 
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/(p)-jdP H M (p, P)/(Q[P]) 

= /(p)-J"dPtf (n) (p, P)/(P) 

+ _[dPtf (n) (p,P) [/(P)-/(Q[P])], 

where /(P) was subtracted from and added to/(Q[P]). 
The first line on the r.h.s., 

/(p)-JdPff (n) (p,P)/(P) , 



5.13 



5.14 



is independent of the jet algorithm. Its smallness is described 
by 4.15. The subtracted term is the average value of / on had- 
ronic events generated by the partonic event p. We can draw 
the first conclusion: 

The contribution 5.14 sets a jet definition-independent limit on 
how well hadronization can be inverted. 

5.15 



requires, heuristically speaking, that Q should have fewer jets 
than P has particles. Thanks to the C-continuity of the partici- 
pating observables /, this can be achieved via two mecha- 
nisms: a replacement of sufficiently narrow sprays of particles 
by single jets (pseudoparticles from Q), and dropping particles 
which carry sufficiently small fractions of event's total energy 
(the so-called soft particles). 

(ii) The replacement of a narrow spray of particles by one 
pseudoparticle implies that the detailed structure of the events 
at small correlation angles is less important than its structure 
grosso modo (what is called "topology of jets"). This makes 
natural the eventual occurrence in the jet definition of a pa- 
rameter interpreted as the maximal jet radius (R). On the other 
hand, different / in 5.9 are differently sensitive to replace- 
ments of sprays of particles with a single jet. The induced error 
will be greater for the observables whose angular functions 
(see 3.36) vary faster. The angular resolution parameter R will 
then control the subclass of / for which the error is minimized 
(Sec. 6.25). 

(iii) The criterion 5 .9 is formulated for individual events, and A conse quence is that measuring quality of jet algorithms by 
the error may also depend on P, so that the approximate percentage of restored parton events is meaningless beyond a 
equality must hold in some integral sense (Sec. 5.21). This is certain limit. To go beyond that limit, it is necessary to go be- 
where dynamical considerations may enter into the picture y 0n[ j me restrictions of the scheme 4 38 

(Sec.5.31). 

The dependence on the jet algorithm only appears in the 

(iv) With 5.9, the jet algorithm can be interpreted as a trick seCQnd Une Qn ^ ^ Qf 5 U xherefore; t0 minimize 5 13 

for approximate evaluation of (or for construction of approxi- (and go ^ error in 5 . j 1} it is sufficient to minimize the fol . 

mations for) complicated C-continuous observables such as the lowing expression- 
optimal observables 4.20. The trick is unusual in that here one 

simplifies the arguments, whereas normally one would sim- J dP H ( "> (p, P) [/ (P) - / (Q[P])] . 5.16 
plify the expression of the function to be computed. 

/ \ x. - i . -c- f ■ Such a minimization has to be accomplished for any p but 

(v) It is clear that the optimal jet configuration for a given r 3 r 

a » u j c. a i / ,1. • , c. since the jet algorithm cannot depend on the unknown p, the 

event need not be defined uniquely (more than one jet configu- j & r i*> 

rations may ensure 5.9 with a comparable error). Physically, onl y meaningful general option is to minimize the expression 

this corresponds to the fact that different hard parton events in s 4 uare brackets for each P < and we come back t0 the crite " 

may hadronize into the same hadronic event. This is an impor- r ' on ^ 

tant option completely missing from conventional discussions. An interesting property is that for a fixed P, the obtained 

We will come back to this in Sec. 9.1. criterion is independent of the hadronization kernel H {n \ i.e. of 

, c r. -i r i any dynamical information. This conclusion holds indepen- 

(vi) A definition such as Eq. 5.9 would be genuinely useful , , „ , > ,- ^„ ■ •>,,., 

, - f ,, , ,., ■„*■_■ *■ dently of n , the order of pQCD corrections included into the 

only it one could control the approximation error via an esti- art 1 1 b' b i t' 

mate which would be both simple and precise. The general parton eve pro a i lties. 

form of such estimates is discussed in Sec. 5. 17. • Dynamical information, however, may affect one's decisions 

. . about the allowed error for different P. We will turn to this in 

Inversion of hadronization 5.10 Sec 5 31 

A remarkable fact is that the same criterion 5.9 also ensures ^ quantitative definition 5 17 

what can be described as an optimal inversion of hadroniza- 

tion. Our analysis of the qualitative definition 5.9 is based on 

How well a given jet algorithm inverts hadronization is inequalities of the following factorized form to be obtained in 

measured by how well the kernel 4.46 is approximated by the Section 6: 
5-function: ; 

,(«), ^ *, , C(( i |/(P)-/(Q)|<C / Q[P,Q], 5.18 

M n, (p,q) = <5(p,q). 5.11 ' ' 3 

The only way to interpret this is via integrals with C- where the constant C/ is independent of P and Q = Q[P], 

continuous functions (cf. 3.43). whereas the expression £2[P,Q] is independent of /. 

So, integrate both sides with an arbitrary C-continuous Existence of a factorized estimate 5.18 could not have been 

function /(q). For the r.h.s. we obtain /(p). For the l.h.s., use postulated a priori. Another surprise is that £2 turns out to be 

the definition 4.46 and obtain an infrared-safe shape observable of a conventional kind and, 

f , , r , , moreover, closely related to the thrust (see Sec. 8.11). 

dP/j (n) (p,q)/(q)= dP# (n) (p,P)/(Q[P]). 5.12 . t ' , c 1C ... „. tf , ,. 

J \r>*vj \*v j vi" An estimate of the form 5.18 would be sufficient for defin- 

Then consider the resulting difference: in g J et configurations in such a way as to control the errors in- 

duced in the observables via 5.24. The simplest option is to 
specify a small positive ct) cilt (which in general may be chosen 
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differently for different events P) and then define Q by de- 
manding that it ensures that 



fl[P, Q]<» cut (P) 



5.19 



Since the purpose of replacing P by Q is to simplify calcula- 
tions, one would seek to satisfy the restriction 5.19 with a 
minimal number of jets in Q. On the other hand, in order to 
minimize the actual error induced in the transition to jets, one 
would seek to minimize £1[P,Q]. To summarize: 



An optimal configuration of jets Q op t for a given event P mini- 
mizes £2[P,Q] while satisfying the restriction 5.19 with a minimal 
number of jets. 

5.20 



See Sec. 9.1 for a discussion of important implications of the 
fact that the optimal jet configuration on which the minimum 
of £1[P,Q] is reached is, in general, not unique. 



Errors induced in observed numbers 



5.21 



In the final respect, the observed physical value is (/) and 

not /(P). So, we must study how the errors induced in/(P) for 
each P propagate to the level of (/) . 

To this end, recall Eqs. 2.3-2.4. The replacement of events 
P by the corresponding jet configurations Q results in the fol- 
lowing expressions: 



(/) t h,je tS = J dP ^ P )/(Q)' 

(f) =— y f(Q ) , 

V /exp.jets at •* ^ l' 



5.22 



5.23 



where Q = Q opt is a function of P as defined by 5.20, so that 
Qi is its value on P; . Using the bound 5.18, one obtains 



!(/)-(/) 



thjetsi 



< CfO), 



5.24 



where 



CO 



= JdPn-(P) x £2[P,Q] < JdP7T(P) x » cut (P). 5.25 



This expression controls the errors inherited by all interpreted 
physical information (M w , etc.) extracted via jets according to 
5.22-5.23. 

• The quantity CO together with its fluctuations can be esti- 
mated like any other observable as the mean value and vari- 
ance of £2[P, Q] which is computed for each event P in the 
process of minimization according to 5.20. 



(Non) optimality of jet definitions 



5.26 



The above reasoning shows that a jet algorithm can be re- 
garded as a tool for approximate evaluation of at least basic 
shape observables. However, recall that general C-continuous 
observables — including the optimal observables 4.20 — can 
be approximated by algebraic combinations of basic shape ob- 
servables (Sec. 3.35). This means that the error estimate 5.18 
will be inherited by a class of general C-continuous observ- 
ables which have appropriate regularity properties. (From the 
derivation in Section 6 this should be a C-continuous analog of 
continuous second order derivatives; cf. the discussion in 
Sec. 4.3.) The optimal observables 4.20 cannot be reasonably 



expected to have discontinuities in any derivatives, so they fall 
into this class. 

With the bound 5.18 valid for the optimal observables, the 
jet algorithm based on it can be regarded as a trick for ap- 
proximate computation of — or, equivalently, constructing ap- 
proximations for — such observables. This allows one to com- 
pare different jet definitions on the basis of the magnitude of 
the errors they induce in the relation 5.9 (more precisely, one 
looks at 5.18). 

We will use the term optimal and its derivatives in connec- 
tion with various jet definitions in the following sense: 

A jet finding prescription A is less optimal than another 
prescription B if with a given number of jets (which is a meas- 
ure of computational economy) the jet configurations produced 
by A inherit less information from the original event than is the 
case with B. In other words, the use of the scheme A makes it 
computationally harder compared with B to approximate opti- 
mal observables 4.20 and thus to achieve the best possible pre- 
cision for fundamental parameters such as CCs,M w , etc. 

(It is possible to make this more precise via inequalities for 
different £1 by analogy with the standard techniques for com- 
parison of norms in vector spaces. We skip this exercise be- 
cause the conventional algorithms cannot be easily represented 
in the spirit of 5.18.) 

We will use this notion in Section 10 to compare the jet 
definition we will derive with the conventional algorithms. 

• An obvious conclusion from the above reasoning is that the 
estimate 5.18 should be as precise as possible, i.e. its con- 
struction should not involve tricks which would overestimate 
the error. This would ensure optimality of the resulting jet 
definition. We will pay heed to this in Section 6. 

• To avoid confusion, note that the optimality of jet algo- 
rithms is a different (although metaphysically related) thing 
from the optimality of observables (Sec. 2.25), in particular 
from the optimality of observables within the restrictions of the 
scheme 4.38 with a fixed jet algorithm. 

The universal jet definition 5.27 

The simplest universal option is to choose ffl cul (P) to be in- 
dependent of the event P: 



<B cut(P) = ffl cut = const ■ 



5.28 



Then because the probability distribution is normalized to 1, 
JdP^(P) = l, 5.29 
Eq. 5.24 would be ensured with some co< co cut . So: 

The parameter co cul of the universal jet definition directly controls 
the errors induced in the physical information by the replacement of 
events with the corresponding configurations of jets. 

5.30 
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Inclusion of dynamical information 

As is clear from Eq. 5.25, one can include dynamics into 
consideration by simply making ffl cul depend on P. 

All the dynamics is expressed by the probability density 
7C(P). Suppose it has enhancements for certain types of events 
(as indeed it does in QCD owing to collinear singularities). 
Then it would be sufficient to choose <» cul (P) to anticorrelate 
with 7l(P). For instance, 

» cut (P) = G> cut x£(P), 5.32 

where the factor £(P) < 1 (which should, strictly speaking, be 
a shape observable) contains all the dependence on events. 
Then Eq.5.25 becomes 

co < a cut x JdP^-(P) f (P) . 5.33 

Choosing £ to anticorrelate with n would reduce the integral 
thus suppressing the overall error. 

From the point of view of minimization of induced errors, 
the following observation makes such a modification less at- 
tractive. Indeed, the computational savings due to larger C0 cu t 
for the events which are produced less often may turn out to be 
simply not worth the trouble: one could simply take a smaller 
event-independent tt) cul from the very beginning. 

However, a non-constant ffl cul (P) would affect the shape of 
the fc-jet subregions in the space of events similarly to the dif- 
ference between how, say, cone and kj algorithms see jets. So 
one may wish to keep this option open for ultimate flexibility. 

Note that if one modifies the conventional scheme 4.38 as a 
whole — and the most important modifications seem to corre- 
spond to relaxation of the jet-number cut (some such options 
are discussed in Section 9) — then the details of how if -jet 
sectors are defined in the space of events may become less im- 
portant. 

Determining a specific form for £(Q) is left to experts in 
the dynamics of QCD. In practice high precision is not needed 
here, and one could choose £(P) to depend on Q found e.g. 
using the universal jet definition with co cut = CQ cut or some 

other value. Then £(Q) could be chosen so that f _1 (Q) roughly 
imitates the structure of dominant terms in 7l(P). Note that the 
quantities such as the invariant masses of the jets, and trans- 
verse momenta of particles in each jet — along with new inter- 
esting characteristics such as the fuzziness of each jet; cf.8.19 
— are easily computed from the output of the optimal jet defi- 
nition which we will derive. 

Lastly, an effect essentially equivalent to a modification of 
©cut according to 5.32 can be achieved via keeping ffl cut P- 
independent but replacing £1[P, Q] in 5.19 by another function 
such that 

£2[P,Q] > n[P,Q]. 5.34 

Then the control of information loss in the transition from 
events to jets would still be ensured but one could choose 
£2[P,Q] to meet some additional requirements. The difficulty 

here is to keep £1[P,Q] simple and suitable for numerical im- 
plementation. 

A detailed investigation of these options is beyond the scope 
of the present paper. 



6 

In this section we are going to obtain a factorized estimate 
of the form 5.18 which would satisfy the criterion of optimality 
of Sec. 5.26. 

Surprisingly, all one needs to obtain such an estimate is es- 
sentially an angular Taylor expansion through second order. 

Recombination matrix z a j 6.1 

Recalling 3.36, the quantity to be estimated becomes 

|/(P)-/(Q)| = |E a £«/(A,) | • 6.2 

To construct a bound for the r.h.s., one can only compare the 
values of / at some p a with its values at some q t . But which 

p a to compare with which q t may not be decided a priori. 

Introduce the recombination matrix z a j which is heuristi- 
cally interpreted as the fraction of a-th particle's energy that 
goes into the j'-th jet (this interpretation will be justified below; 
cf. 6.16). Impose the following restrictions on z fl , : u 



z OJ >0 for any a, j; 6.3 

- 1 ~'L j z aj^ foran y a - 64 

One can see from the derivation that removing the restric- 
tions on z a j does not expand the eventual range of options. All 
one has to do is replace z„ ; , z, — » |z„ ; L I z ; | in 6.8 and 6.20, so 

that configurations not satisfying 6.3, 6.4 are automatically dis- 
favored compared with the corresponding boundary points. 

Non-zero values of the quantity z„ correspond to some en- 
ergy being left out of the formation of jets (the so-called soft 
energy ). We will see that this corresponds to exclusion of some 
soft stray particles (the soft component of the event's energy 
flow) from the formation of jets. 

Allowing fractional values for z a f 

a) fully agrees with the physical picture of production of 
colorless hadrons as a result of collective interaction of 
the underlying hard colored partons; 

b) is extremely convenient algorithmically because the 
space of all possible jet configurations for a given event 
is then path-connected, so that any jet configuration can 
be reached from any other via a continuous path, allow- 
ing efficient shortest-path search algorithms [7]. 

We will say that the a-th particle belongs to the j'-th jet if 
Z a j = 1 . If z a = 1 , the particle is said to belong to soft energy. 

With the recombination matrix, rewrite 6.2 as follows (the 
first line is an identical transformation of 6.2, which explains 
the restriction 6.4): 

E a ^^/(p a )+E a (E^)£ a /(p a )-E^/(9;)| 
^|E/ a £ a /(^)| + |E y -(E fl ^ fl /(^)- E ;/(^))| 

* Lja K \f(P a )\ + E;| L a ZajE a f(P a )-Zjf(4j)\ ■ 6.5 

One sees why we split particles into fragments correspond- 
ing to jets rather than vice versa: we target situations with 



5.31 Derivation of the factorized estimate 



° Formulas in solid boxes are part of the final result; they represent all the 
information needed for algorithmic implementations. 
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fewer jets than particles, so it is desirable to arrange cancella- 
tions between as many terms as possible (the inner sum), and 
to minimize the number of positive terms in the outer sum. 

Estimating the effects of soft energy 6.6 

The first sum on the r.h.s. of 6.5 can be estimated as fol- 
lows: 

L a z a E a \f(P a )\<C fA E sofl [P,Q], 6.7 
where C ftl is the maximal value of l/l over all directions and 



E soft [P,Q] =E a z<A- 6.8 



This quantity will play a central role in the optimal jet defini- 
tion. It is interpreted as the event' s energy fraction left out of 
the formation of jets (the soft energy, as we agreed to call it). It 
can be visualized as a background from which jets stick out. 

Understanding the form of E SO ft 6.9 

Mathematically possible are many other ways to obtain a 
factorized estimate of the form 6.7. The variant with 6.8 is sin- 
gled out by the following properties: 

(i) Analytical simplicity which leads to fast algorithms. 

(ii) Linearity in energies of all particles which ensures infra- 
red safety of the resulting jet definition. 

(iii) The property that can be called maximal inclusiveness . 

For instance, also possible is a bound in terms of 
max a (z a E a ) but that would require comparison of particles' 
energies, which is physically meaningless if their directions are 
close. 

A somewhat more meaningful option would be to perform a 
smearing of the soft energy over some angular radius thus 
transforming the soft energy flow into a continuous function, 
and then using the maximal value of that function as an alter- 
native to 6.8 (the constant C/, i would change accordingly). 
This would be similar to the so-called 'f'-cut [2], i.e. a lower 
cut on the energy of the jets retained in the final jet configura- 
tion/ However, there are three reasons why such alternatives 
seem to be undesirable: 

1 ) On the measurement side, they introduce non-optimality 
into the bound implying a further loss of information in the 
transition from the event to jets. 

2) Computationally, they introduce a complexity into our jet 
definition unwarranted by physical considerations. w 

3) On the QCD side, they are less inclusive than the expres- 
sion 6.8, i.e. they introduce into consideration subregions of 
phase space. It is a well-known fact that exclusiveness of ob- 
servables anti-correlates with the predictive power of QCD. 
For instance, a totally inclusive treatment of soft energy was 
built already into the jet definition of [8]. 

Taylor expansion in angles 6.10 

To Taylor-expand f(p a ) near q t is just a little tricky, and 
we proceed as follows. Consider the plane n which is normal 



v The discussion in the first posting of this paper interpreted conventional 
procedures incorrectly. The present version owes much in this respect to 
ref. [2]. 

" In contrast, the conventional algorithms seem to favor the 'f -cuts be- 
cause, apparently, there is no simple recipe to identify the particles to be 
relegated to soft energy prior to recombinations. 
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to the unit 3-vector corresponding to q t . Then map the direc- 
tions to II: p — > p n , so that the angular distances between di- 
rections are preserved near q t : 

\p n -qf\ = 0(\p-qj\), P^qj, 6.11 

where the l.h.s. is a euclidean distance in n. (An example of 
such a mapping is given in Sec. 7.2.) 

Then f(p) becomes a function on n which we denote as 
f n (p n ) = f(p) . We will use the Taylor expansion in the form 
of the following inequality: 

|/n(P a n )-/n(?T)-[P a n -?; n ]-/n(?T)| * KjC f , 3 , 6.12 

where p]} -qf is a vector in n and / n is the gradient of f n . 

The constant Ct ^ hides maximal values of some combinations 

of/ and its derivatives through second order. The maximum is 
taken over all directions qf because we will deal with a sum 

over unspecified qf . 

The only properties which we require the factor A a j to have 

are that it is a monotonic function of the angular distance 
\p-qj I , and it is such that 

A a; = lp-?,l 2 , P^>4j- 6.13 

It may otherwise be arbitrary. A modification of A aj within 

these restrictions is compensated for by an appropriate change 
of Q j3 . This observation effectively decouples the form of the 

r.h.s. of 6.12 from the concrete choice of the mapping n. 

The result 6.12 can be used to estimate the second term on 
the r.h.s. of 6.5 (add and subtract terms as needed to apply 
6.12). Take into account the fact that the values of / at differ- 
ent points are in general independent, so the corresponding ex- 
pressions have to be bounded independently. Obtain the fol- 
lowing upper bound for the second sum on the r.h.s. of 6.5: 

CfAiLjlLaZajEa-Zjl) 

+ 9,5(EjE a ^J^ n -^]|) + C />3 (i: 7 , a ^A a; ). 6.14 

Minimizing 6.14 6.15 

The task is to minimize 6.14 using the freedom to choose 
£,-, qf , z a j ■ The arbitrariness associated with n, and A aJ 

will require additional consideration to be eliminated 
(Section 7). 

The first term is suppressed if 

E ;=E a ^A for each j . 6.16 

This fixes E ; in terms of z a j and is immediately interpreted as 
energy conservation in the formation of jets. 
The second term is suppressed if 

MJ^EaV 5 **? for each j , 6.17 

where we used 6.16. This determines q t (via qf ) in terms of 

z a j. Note the arbitrariness due to the arbitrary n which will be 
fixed in Section 7. Anyhow: 
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Eqs. 6.16 and 6. 1 7 fix the parameters of jets in terms of the re- 
combination matrix z a j which, therefore, is the fundamental un- 
known in this scheme. 

6.18 

• The described trick actually differs from conventional 
schemes only by explicit presence of z a j which fully describes 
the distribution of particles between jets in any jet finding 
scheme with energy-momentum conservation. 

With 6.16 and 6.17, only the last sum survives in 6.14. So, 
redefining the inessential /-dependent constants and recalling 
Eq. 6.7, we arrive at the following estimate: 



j{/,P}-{/,Q}| < C f .,Y[P,Q]+C f , 2 E soft 



[P,Q], 



6.19 



where 



6.20 



Note that Y is linear in all particles' energies as is E so f t ; cf. the 
comments after 6.8. Recall also that A a j = 0{Q 2 a j) for small 

Qaj , which is the angle between the a-th particle and the 7-fh 
jet. 

• Although the bound 6.19 falls short of the desired factorized 
form 5.18, its derivation did not involve any arbitrariness that 
would deserve any further discussion. 

The following two points do deserve a detailed discussion: 

(i) The arbitrariness in the choice of the angular factors A aj 
in 6.20. This will be fixed in Section 7 from simple kinemati- 
cal considerations, resulting in considerable algorithmic sim- 
plifications. 

(ii) Transition to the factorized form 5.18. 



Obtaining a factorized estimate" 



6.21 



General options 



6.22 



Mathematically speaking, the basic bound 6.19 can be re- 
duced to the required factorized form 5.18 in a variety of ways. 
Consider Y and Esoft as components of a two-dimensional vec- 
tor U = (Uj, U 2 ) = (Y,E soft ) . Then for a wide class of non- 
negative functions W(v) one can obtain inequality of the form 

C fl v 1 + C f2 v 2 <C fW xW(v), for all v. 6.23 
For instance, one can take 

W(v)=(av? +Pv^) l ' p 6.24 

with a, P, p > . In any event it is reasonable to restrict W to 
satisfy the condition W(kv) = kW(v) for all positive k (or 
even to be a norm in the mathematical sense, i.e. also satisfy 
W(v l + v 2 )<W(vO + W(v 2 )). 



x The first posting of this paper described a somewhat simplistic way to 
take into account E so ft (then called E m i ss ) in which one would minimize Y 
while keeping E so f t fixed to a constant. It was justified by a somewhat 
vague reference to "the physical meaning of jet counting" — and, although 
not incorrect, was the only step of the derivation not clarified by a precise 
argument. The systematic approach outlined below attains an ultimate 
analytical simplicity for the criterion, exhibits a deep connection with the 
conventional cone algorithms, and results in a much faster algorithmic im- 
plementation thanks to elimination of the algorithmically cumbersome re- 
striction E so f t = const. 



From 6.23 one obtains Eq.5.18 with Q. w = W(Y,E soft ) . 

(Note that although thus defined £2 is not a linear function of 
all particles' energies, all the dependence on the event is via 
two such functions, Y and E so ft. So infrared safety is not an is- 
sue here.) 



The linear choice 



6.25 



However, there is one choice of W(v) which is singled out 
by its nice properties, namely, 

W(V) = R~ 2 v l +v 2 . 6.26 

The coefficients of the linear combination must be positive, 
and the overall normalization is inessential. The specific form 
of the coefficient, R~ 2 , is chosen for convenience of interpre- 
tation. Its introduction makes explicit the arbitrariness in the 
choice of measurement unit for angles (the role of R is dis- 
cussed in Sec. 8.14). 

With W given by 6.26, one obtains 5.18 with 
Ct = max (R 2 Cf u Cf 2 ) and with £2 replaced by 



£l R = R~ 2 Y + E soft 



6.27 



This choice is singled out by the following properties: 

(i) analytical simplicity resulting in transparency of the cor- 
responding jet definition and simplicity of implementation; 

(ii) the inequality 6.23 becomes an identical equality for 

Cf,2= R2 C f , l . 6.28 

This last fact means that for observables satisfying 6.28, the 
transition from the basic estimate 6.19 to the factorized one, 
Eq.5.18, via 6.26 does not entail any further loss of information 
about the event — for any event. Only the linear form 6.27 has 
this property. 

We will consider linear form 6.27 as a standard reference 
point for comparison of alternatives. This issue will be further 
discussed in the context of the so-called Y-E so ft distribution in 
Sec. 8.19. 



Existence of the optimal jet configuration 



6.29 



We have obtained the factorized estimate 5.25 with £2 given 
by 6.27. This allows us to define optimal jet configurations ac- 
cording to the prescription 5.20. 

Such a configuration Q opt always exists. Indeed, the quan- 
tity Cl R is a non-negative continuous function of z a j, and the 
domain of z a j is compact for each fixed N(Q) (cf. 6.4, 6.3). 
So the l.h.s. always has a global minimum in this domain. 
Furthermore, the minimum value is a monotonically decreasing 
function of N(Q) because each extra jet in Q adds new degrees 
of freedom for minimization, driving down the minimal value 
which reaches zero for all N(Q) > N(P). So Q opt exists for any 
P and ffl cut > 0. 

The global minimum need not be unique even modulo re- 
numberings of jets. 
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Fine-tuning the angular factors in Y[P,Q] 7 

The form of the angular factors A aj in 6.20 is fixed within 
the arbitrariness of the scheme by simple additional considera- 
tions: (a) conformance to relativistic kinematics; (b) momen- 
tum conservation. The true elegance — and final justification 
— of the resulting construction is in the considerable compu- 
tational simplifications resulting from a representation of the 
jet finding criterion in terms of 4-vectors and Lorentz scalar 
products (see after 7.10). 

First, with each pair E a , p a one associates a massless 4- 

vector p a , pi = (specific expressions depend on the repre- 
sentation of p a ; see below). Then define: 



9j = L a Z aj p a 



7.1 



This object occurs in a natural way in our construction, and we 
will call it the jet's physical 4-momentum . 



Spherical geometry (e + e -> hadrons in c.m.s. 



7.2 



Here one emphasizes spherical symmetry. The directions p 
are interpreted as points of the unit sphere, i.e. unit 3-vectors: 
p 2 = 1 . Then the 4-momentum p a associated with the pair 
E a , p a has the energy component p0 = E a and the 3- 
momentum component p a = E a p a . 

3-momentum conservation 7.3 

We must choose a mapping of the unit sphere to the plane 
II which is normal to 4i . A simple choice is the stereographic 
projection from the point -q ; : 

Pa = Pa + taj{Pa + 9 j) = Pa + 7.4 

where t a j = (l-c)/(l + c) with c = p a ■ </ ; - . Then Eq.6.17 is 
rewritten as 

9j £ 7 = 9) + Ha Z aJ E a ^{pa+tj) = Pj + 0^6%) . 7.5 

where qj is the space-like component of 7.1. 

The arbitrariness in the choice of the mapping manifests it- 
self through the terms 0(E a d 2 a j) in 7.5. The simplest choice is 

to drop those terms altogether but then one would have to im- 
pose a correct normalization on 4i : 



9j = Pj/lPjl- 



7.6 



The direct normalization here cannot take one outside the 
0{E a Q\j) arbitrariness in 7.5 (because ensuring a correct nor- 
malization of 4) was part of the job of the O(E O 0^) terms). 
This can also be verified directly. 

Fixing A aj 7.7 

A aJ can be chosen in such a way as to eliminate one cum- 
bersome summation over all particles in the event (which has 
to be performed with 6.20 after evaluation of 4i ) and reduce 

all the complexity in the computation of the criterion to 
evaluation of the 4-vectors qj. y The choice is this: 



y The described choice allows a simple incremental update of q, after a 
modification of a particle' s splitting between jets, which results in a major 



jKj = 1-Pa9j = l-cos0„ ; = \[p a -qjf=E a 1 p a q } , 7.8 
where 



q } = ( l, 4 j ) , qj = . 



7.9 



This is a light-like Lorentz vector with unit energy uniquely as- 
sociated with the jet's spatial direction 4, ■ Then one uses 7.1 

to perform the summation over a and obtains: 



Y[P,Q] = £ Y,[P,Q] = 2£ 



7.10 



where the r.h.s. contains only Lorentz scalar products. 

Note that in this kinematics q } ■ ■ q f = E ] ■ - \q } I but the covari- 

ant form 7.10 is more general as we are going to see shortly. 



Cylindrical geometry (hadron collisions) 



7.11 



According to the standard Snowmass conventions (cf. [2]), 
here one direction (the beam axis) is singled out, and one em- 
phasizes invariance with respect to Lorentz boosts along the 
beam axis. Therefore one should use the representation 3.4 for 
particles' 4-momenta. In particular, one has to interpret ener- 
gies according to 3.7 in all the formulas related to jet defini- 
tion. Then a reasoning similar to the spherically symmetric 
case leads one to the following results: 

The y'-th jet's transverse direction 4f~ is determined simi- 
larly to 7.6 from conservation of transverse momentum: 



9f=9j/\9fl 



7.12 



with qj- taken from 7.1. (At this point we choose to differ 

from the Snowmass definition which postulates conservation of 
energy-weighted azimuthal angle in jet formation. For narrow 
jets the two definitions are equivalent. On the other hand, our 
definition leads to a simpler code; cf. the remark after 7.10.) 

For the jet's pseudorapidity z one has the Snowmass defini- 
tion which is invariant with respect to boosts along the beam 
axis: 



7.13 



For A a j there is the following simple choice (this structure 
is borrowed from [27] where it appeared in the context of con- 
ventional jet algorithms): 



j- A aj = cosh(r/„ - r\j ) - cos(<p„ - <pj ) . 
Then — surprise! — one recovers 7.10 with 



7.14 



qj = (cosh r]j , sinh t] j ,qf) , qj = , 



7.15 



where q y is also a light-like Lorentz vector uniquely associ- 
ated with the jet's spatial direction specified in this case by the 
rapidity rjj and the transverse direction 4j — and also with 

unit energy — but now it is the unit transverse energy! 



speedup (by two orders of magnitude) of the minimum search algorithm; 
for more see [7]. 

2 Note that one can compute the jet's physical rapidity directly from q,. 



F.V.Tkachov hep-ph/9901444 [2 nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 32 of 45 



Summary. The optimal jet definition (OJD) 



7.16 



Finding an optimal configuration of jets Q (Eq.4.31) for a 
given event P (Eq.3.6) is equivalent to finding the recombina- 
tion matrix z a j (Sec. 6.1) that determines jets' parameters via 
7.1 and 7.6 (for spherical [c.m.s.] kinematics) or via 7.12 and 
7.13 (for cylindrical [hadron collisions] kinematics). 

The matrix elements z a j are found according to the pre- 
scription 5.20 with £2 R [P,Q] specified by 6.27 where 
E soft [P,Q] and Y[P,Q] are defined, respectively, by 6.8 and 
7.10. 

The light-like Lorentz vectors q~j are given by 7.9 
(spherical kinematics) or 7.15 (cylindrical kinematics). 



A simplified jet definition (the Y-criterion) 



8.4 



This is the simplest universal dynamics-agnostic jet defini- 
tion. Dynamical considerations can be accommodated as de- 
scribed in Sec. 5.31. 



Understanding the mechanism of OJD 8 

To understand how the optimal jet definition (OJD; 
Sec. 7.16) "finds" jets, it is sufficient to understand what jet 
configurations yield minima for the criterion Q. R depending on 
the structure of the original event P. 

"Fuzziness" of the event 8.1 

For each integer m > 1 , compute the quantity 



J*(P) = min £2,JP,Q'] > 0. 

M(Q')=m 



8.2 



For each fixed P and R , this sequence monotonically decreases 
with increasing m . 

As will become clearer from what follows, the observable 
J«(P) ls best described as the event's cumulative fuzziness 
relative to m axes at the angular resolution R . It receives con- 
tributions of two kinds as seen from 6.27: 

• a contribution from each of the m jets, Yy = 2qy q^ ; this 
can be conveniently called the fuzziness of the j-th jet ; 

• a contribution from soft stray particles which is simply the 
soft energy E SO ft. 

One can describe the mechanism as follows: 

OJD minimizes the cumulative fuzziness of the event by balanc- 
ing contributions from each of the jets and from the soft energy. 

8.3 

The functions J*(P) are shape observables similar to 
thrust (8.1 1). Observables similar to 8.2 were first introduced 
on the basis of conventional algorithms [28] but in our case 
they are specified by explicit analytical expressions. Even sim- 
pler analytical expressions (not involving optimization of any 
kind) were introduced in [3], [4] (the so-called jet-number dis- 
criminators) but they avoid identification of individual jets al- 
together. 

In order to understand the mechanism of minimization, one 
notes that the analytical structure of OJD is very simple and 
regular, so it is sufficient to consider a few simple examples. 



First it is convenient to ignore E so ft in 6.19. This is valid for 
events without soft particles outside a few energetic jets and is 
equivalent to restricting the jet configurations Q used to mini- 
mize the error 6.2 by requiring that all particles are included 
into the formation of jets, with none relegated to the soft en- 
ergy. Formally, this is described as follows: 

z a =0 <=> E soft [P,Q] = 0. 8.5 

Such a restriction makes the error estimate less precise entail- 
ing a non-optimal loss of information in the transition from P 
to Q, but it is otherwise admissible. 

The corresponding simplified definition is as follows: 
A sub-optimal configuration of jets Q sub for a given event P 
minimizes Y[P,Q] and meets the following criterion with a 
minimal number of jets: 

Y[P,Q sub ] < y cm . 8.6 

It will be convenient to refer to this as the Y-criterion . 

Note that this type of the criterion corresponds to R —> oo 
in 6.27. (For very large R, any contributions to soft energy 
would be disfavored. See also the discussion in Sec. 8.14.) 



Minimizing Y 



8.7 



Let us verify that the Y-criterion satisfies the boundary con- 
dition 4.33. 

The quantity Y[P,Q], Eq.6.20, is sensi- 
tive to presence of sprays of particles in the 
event P due to the angular factors A a j. Con- 
sider the simplest event P with two particles 
carrying equal energy. Then the criterion 

will see either one or two jets depending on 8.8 
whether or not 




8.9 




(remember that we are always dealing with fractions of the to- 
tal energy of the event). For configurations with energy distrib- 
uted between particles in a less symmetric fashion, a wider jet 
will be allowed for the same y cut . 

Next suppose one has two pairs of parti- 
cles, with a narrow angular separation be- 
tween particles of each pair, and the angu- ^ ^-L^ > 
lar separation between the pairs denoted as 

8 10 

0. Assume 2 » y cul . Then if one mini- 
mizes Y[P, Q] on the configurations Q with two jets, there is a 
global minimum corresponding to the configuration with each 
pair combined into a jet, and the minimum is unique up to a 
renumbering of the jets. 

In other words, the angular factors in the expression for Y 
ensure a maximal suppression of a contribution from a spray of 
particles if the particles of the spray are made to constitute a 
jet (i.e. the corresponding z a j = 1 with z a f = for all j'± j ), 

so that the jet's axis is automatically inside the spray. 

This conclusion extends to more than two jets: if N(Q) (the 
number of jets in Q) matches the number of sprays of particles 
in the event, then the global minimum of Y is reached on the 
configuration with jets and sprays in one-to-one correspon- 
dence so that each jet comprises exactly all the particles from 
the corresponding spray. If sprays are not narrow enough then 
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the allocation of particles between jets is effected in a more 
dynamic fashion. 



Y-criterion and thrust 



8.11 



Recall the definition of the shape observable thrust, 4.27. 
Suppose all particles of the event are localized within a suffi- 
ciently narrow solid angle. Then the maximum is achieved for 
some axis inside the angle so that for all particles 9 a < y . Re- 
call 3.16 and obtain: 



1-T=min£ fl E a (l-cos0 a ) «lmin£ a E a 2 a 



8.12 



Comparing this with 6.20 and 7.10, we see that finding the 
thrust axis in this case is equivalent to finding the single jet di- 
rection according to the Y-criterion. Then 1 - T is equivalent 
to J~ (P) with the two jet directions restricted to be exactly 
opposite (each forming one half of the thrust axis). We con- 
clude: 

The Y-criterion generalizes l — T, where T is the thrust, to the 
case of any number of thrust semi-axes which in the case of the Y- 
criterion become jet directions. 

8.13 

The same can be said about OJD because it is a modification of 
the Y-criterion. 



From the Y-criterion to OJD. 
Connection with cone algorithms 



8.14 



OJD differs from the Y-criterion by inclusion of E so ft into 
the function to be minimized. 

Let us discuss how OJD determines the optimal jet configu- 
ration compared with the Y-criterion 8.6. Thanks to the ana- 
lytical simplicity and regularity of 6.27 (just two degrees of 
freedom, Y and E so f t , both with a simple structure and a clear 
meaning) it is sufficient to consider a few simple examples. 

Fix a configuration P ( o) that consists of only one hard parton 
with the 3-momentum represented by the left figure 8.15. 
Then both the simplified criterion 8.6 and the optimal one 6.27 
would find one jet exactly equal to the parton (Q = Pro)), with 
Y = E S oft = 0. 

Now deform P(0) by: (1) splitting the parton into almost col- 
linear fragments; (2) radiating a soft fragment; (3) both. The 
three configurations are as follows: 



(1) 



(2) 



(3) 



8.15 



What OJD and the Y-criterion would see depends on tt) cut and 
R and the magnitude of deformations (the acollinearity angle 
and the energies of fragments). 

In the case of (1), OJD will be yielding exactly the same 
configuration as the Y-criterion (at least for not too large acol- 
linearity), i.e. all z a = . This is because it is more advanta- 
geous to have the particles' energy contribute to Y where it 
will be suppressed by the acollinearity angle squared (cf. 6.13, 
7.8, 7.14), rather than relegate any fraction of it to E so ft (i.e. 
have the corresponding z a > ) where no angular suppression 
is present. It is also clear that R directly controls the threshold 
angle beyond which the configuration with both particles in- 
cluded into the jet yields a larger value of £l R than the configu- 



ration with the less energetic particle relegated to the soft en- 
ergy. The exact relation between the threshold angle and R de- 
pends on how energy is distributed between particles (see be- 
low). 

In the case of (2) the Y-criterion must include the soft frag- 
ment into the jet. However, OJD would relegate the fragment 
to the soft energy (the corresponding z a = 1 ) to avoid en- 
hancement by the angular factors (unless R is very large). As a 
result the jet will consist of the hard parton only. 

A similar conclusion is reached for the case (3) where the 
jet direction as found by OJD would include only the two hard 
fragments. 

Furthermore, inclusion of an infinitesimally soft particle 
(£,/?) into an event changes S1 R (apart from the overall 

renormalization by 1 + e) by ~ sO^R' 2 if the particle is in- 
cluded into the j'-th jet (with 6 e j the angle between the jet and 
the particle), and by £ if the particle is relegated to the soft en- 
ergy. So if the particle's angular distance from the nearest jet is 

< R then OJD includes it into that jet. Otherwise the particle 
is relegated to the soft energy. 

For non-infinitesimally soft particles the threshold angle is 

< R . For instance, if an isolated hard parton is split into two 
equal-energy fragments separated by 26 then OJD would in- 
clude them both into one jet or relegate one to soft energy de- 
pending on whether or not 6 < R/yfl . Note that either one of 

the two fragments can be relegated to soft energy, which sim- 
ply means that the global minimum is not unique. However, 
the probability of occurrence of events for which the criterion 
has a degenerate global minimum is theoretically zero. These 
issues will be discussed in more detail in Sec. 9.1. 

Given the generality of the described mechanism, we arrive 
at the following conclusion: 



OJD forms jets on the basis of local structure of energy flow 
within the correlation angle R . 

Quantitatively, R is the maximal angular jet radius as probed by 
infinitesimally soft particles. 

8.16 



Furthermore, the above examples allow us to relate the pa- 
rameter R to the jet radius of the conventional cone algorithms 

R cone : 



R,., 



•R/J2, 



R,., 



= 0.7 



R = l. 



8.17 



The value 0.7 is preferred in the practice of cone algorithms on 
empirical grounds (e.g. [29]). 
To conclude: 



Sensitivity of OJD to the presence of soft stray particles is con- 
trolled by the two parameters R and © cu t : 

R controls which particles are expelled into the soft energy be- 
cause they are too far from jets' axes (the decision also depends on 
the particle's energy), and 

©cut effectively imposes an inclusive upper bound on the soft 
energy. 

8.18 



Remember that the primary role of ® cut is to control the loss 
of information in the transition from events to the correspond- 
ing jet configuration (Sec. 5.21). 
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The Y-Esoft distribution 



8.19 



From the discussion in 8.14 it follows that it would be in- 
teresting to consider contributions to OJD of the two compo- 
nents, Y and E so f t , separately. For different events 8.15 this is 
shown on the left figure below: 



Y| 



(1) 

/ 



.(3) 



(2) 



Y | 



c 0(a s ) 



0(a s ) 



(0) 



8.20 



-soft 



-soft 



Acollinear fragmentations shift the point along the Y-axis, and 
soft stray radiation, along the E so f t axis. 

Then consider the figure 8.20 in the context of QCD. One 
sees that a fragmentation (1) or emission of a stray soft parton 
(2) are, from perturbative viewpoint, effects of relative order 
a s whereas their combination (3) is an effect of relative order 
a 2 s . Similarly, if one attributes these effects to non- 

perturbative "power-suppressed" corrections (i.e. suppression 
by an extra power of the unnormalized total [transverse] energy 
of the event) then one arrives at a similar conclusion with a s 
replaced by E~\ 

On the theoretical side, one can make the following obser- 
vations. For definiteness consider the process e+e~ — > jets and 
the distribution constructed for N(Q) = 2. Then the lowest or- 
der quark-antiquark events are concentrated at Y = E soft = 
(the distribution is <5(Y)<5(E soft ) ). Emission of one gluon 
( q q g events in the perturbative order a s ) creates two <5- 
functional terms localized along the two axes: 
p 8(Y)S(E soft ) 



+ a s [ Pl iY (E soft ) x 8(Y) + p 1>soft (Y) x <S(E soft )] , 



where Pi Y(E so f t ) and /?i jSO f t (Y) are continuous functions. 

This is because the third parton is either included as a whole 
into one of the two jets or relegated to soft energy 
(configurations with the third parton exactly at the boundary of 
the two corresponding phase space regions occur with prob- 
ability zero). 

Emission of further gluons gives rise (apart from modifica- 
tions of the coefficients of 8.21) to configurations which popu- 
late the internal region Y > 0, E soft > , corresponding to a 

continuous distribution: 



«sft(Y,E soft ) 



8.22 



Such a picture, with the 5-functions appropriately smeared 
and deformed into the internal region Y > 0, E soft > , is ex- 
pected to be seen in the data (assuming correctness of pQCD). 
It may be possible to theoretically describe the smearing of <5- 
functions by taking into account power-suppressed corrections 
as well as resummation of large collinear logarithms. Given 
that the Y-E so f, distributions can be constructed for any N(Q) 
and for any process involving jet production, whereas the 
mechanisms behind, say, power corrections seem to be rather 
universal, studying such distributions may prove to be a valu- 
able test of our understanding of the dynamics of QCD. 

To summarize: 



The distribution of events in the Y-E so f t plane aa provides 
a direct model-independent way to quantify the two different effects 
in the mechanism of hadronization, namely, collinear fragmentation 
and soft radiation. So the Y-E so f t distribution is a window on non- 
perturbative QCD effects. 

8.23 

• Note that an even more detailed information is provided by 
the values of fuzziness Y ; of individual jets. One can e.g. study 
the fluctuations of Y ; within the same event, correlations, etc. 

• Also, the values of Y ; together with E so ft can be used as ad- 
ditional parameters on top of jets' 4-momenta. This is a natu- 
ral extension of the jet-related degrees of freedom in terms of 
which to parameterize the events, e.g. in the construction of 
event selection procedures of the conventional type or quasi- 
optimal observables (Sec. 2.25) for specific precision meas- 
urement applications. 

A word of caution: the values Y and E so ft may not be always 
stable with respect to data errors (unlike the minimum value of 
Q.r). This is similar to how positions of global minima may be 
unstable under deformations of the function's shape. It results 
in a smearing of the event distribution along the lines 
Q.r = const, and may impose limitations on the precision of 
such tests of QCD. However, the precision requirements here 
are not as high as in the Standard Model studies. 



On alternative forms of the criterion 



8.24 



At this point it is convenient to discuss the ambiguity in- 
volved in how Y and E so ft are combined to obtain a factorized 
estimate of the form 5.18. After that we will also discuss a 
similar ambiguity with combining contributions from different 
jets into one expression Y (Sec. 8.30). 



8.21 Combining Y and E SO f t 



8.25 



As was already pointed out (see before Eq. 6.23), the form 
of the criterion which is linear in Y and E so ft, Eq. 6.27, is, 
mathematically, not unique. On the other hand, the qualitative 
conclusions about how the criterion organizes particles into jets 
and soft energy (as discussed above in connection with 8.15), 
remain valid for any £2 based on any valid choice of W(v) in 
6.26. In particular, the arguments around Eqs. 8.21-8.22 re- 
main valid. This makes it worthwhile to examine whatever 
further arguments one may find in favor of, or against the sim- 
ple linear form 6.27. 

First of all note that the physically most important degree of 
freedom in W(v) is adequately represented by the free pa- 
rameter R. To discuss the remaining ambiguities it is conven- 
ient to limit the discussion to the degree of freedom repre- 
sented by the parameter B > 1 in the following alternative ex- 
pression for Q.r : 



\l/B 



8.26 



This expression is infrared safe and leads to only marginally 
slower code (the formulas for derivatives used in the algorithm 
[7] become more complex though, but this affects only a small 
part of the entire code which is executed not often provided the 



aa One fixes the number of jets, and for each event finds the corresponding 
optimal jet configuration by minimizing Q.r . Y and E so ft are obtained as a 
by-product. 
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covariant forai of Y is used). In the limit B — > °° the function 
becomes non-smooth: 

^ = max(i?- 2 Y,E soft ), 8.27 

which results in considerable algorithmic complications due to 
nonexistence of derivatives at some points in the space of re- 
combination matrices. The same problem will be manifest for 
large B in the form of numerical instabilities. 

So large values of B are excluded by the requirement of al- 
gorithmic simplicity. The same requirement favors the linear 
choice B = l. However, most algorithmic efforts in a computer 
implementation of the corresponding minimum search algo- 
rithm [7] are spent on a proper handling of the recombination 
matrix z a j and the computation of q, etc., and only a fraction 
of the total code deals with the formulas such as 7.10 and 6.27, 
so that the values B > 1 are not, strictly speaking, excluded. 

The linear choice B = 1 is also singled out by a similar re- 
quirement of analytical simplicity (needed to facilitate theo- 
retical studies of e.g. power-suppressed effects using the Y- 
Esoft plot). 

Considering the alternative values for B , one might be 
tempted to add R~ Y and E so f t in quadrature (B=2). The corre- 
sponding region SI < co cut would be a quarter of an ellipsoid (cf. 
the dotted boundary in the right figure of 8.20). As a further 
example, the rectangular region corresponds to SI < © cut with SI 
defined using B = oo (Eq. 8.27). A typical shape of the region 
Sl R < Oct for the linear choice 6.27 is shown with the dashed 
straight line; larger R correspond to steeper slopes. 

The position of each event on the plane is determined by 
minimization of SI and therefore depends on its specific form, 
so that a straightforward comparison of shapes of the regions 
SI < Ct) cut is in general not meaningful. However: 



For sufficiently small deviations of the fragmented event from the 
parent partonic event (the neighborhood of the origin of the Y-E so f t 
plot, which corresponds to very small as ) the resulting values of Y 
and E SO f t will not depend on the specific form of SI. 

8.28 



This is because the minima of SI tend to correspond to con- 
figurations with Zaj = or 1 (Sec. 9.1), which fact ensures some 
stability of resulting jet configurations with respect to small de- 
formations (unless the event is such that SI has a degenerate 
global minimum — a situation which occurs with probability 
zero). This phenomenon of "snapping" seems to persist for all 
SI for which the corresponding function W(v) (recall the rea- 
soning in Sec. 6.22) is a convex function of the 2-dimensional 
vector v (i.e. a norm in the mathematical sense). For instance, 
the already described (in Sec. 8.14) mechanism of balance be- 
tween Y and E so ft which makes a particle as a whole to either 
belong to a jet or be relegated to soft energy, remains operative 
irrespective of whether one compares Y and E so f t or their posi- 
tive powers as would be the case with the choice 6.24. 

The proposition 8.28 means that in some neighborhood of 
the origin of the Y-E so f t plot, the distribution of events is inde- 
pendent of the specific choice of SI as long as the correspond- 
ing function W(v) is a norm, so that all one has to take into 
consideration is the shape of the region SI < (O CM . 

Then from a purely geometrical point of view (justified by 
the tradition of using visual arguments in the construction of 
e.g. cone jet algorithms), one can reason as follows. There are 
two alternative ways to distort the parent parton event (the 



point (0) in Fig. 8.20): one is to add a non-negligible soft back- 
ground to narrow jets (the arrow directed to the right from the 
origin in 8.20); the other is to make wider jets without much 
soft background (the arrow directed upwards). It is geometri- 
cally clear that the situations where one of these mechanisms 
dominates correspond better to the notion that the number of 
jets in the resulting event stayed the same as in the parent 
parton configuration, than a simultaneous effect of both 
mechanisms (the diagonal direction). In the latter case one 
would prefer to count the same number of jets only if both 
distortions are reduced. This seems to disfavor the shapes of 
the region SI < a cut which are protruding along the diagonal 
(such as the rectangle in the right Fig. 8.20), and favor the 
more "flat" boundaries like the one corresponding to the linear 
choice 6.27. 

In the final respect, the best argument for fine-tuning the 
form of SI may be based on dynamical considerations such as 
suppressing sensitivity to higher-order and hadronization ef- 
fects. The pattern exhibited by the right figure 8.20 and the ad- 
ditivity of small perturbative corrections (at least for small CCs) 
seems to be rather universal and again favors the linear choice 
B = 1 (which leads to Eq. 6.27). 

To conclude: 



Combining contributions from different jets 8.30 

A similar ambiguity may be seen in the way contributions 
from different jets, Yj, are combined into a single expression Y 
(the transformation of the sum over j in the transition from the 
second to third line in 6.5). Most arguments of Sees. 6.22-6.25 
remain valid here too. However, in the case of combining Y 
and Esott in a factorized estimate the problem was due to two 
unknown coefficients Cfj (see 6.19) which vary independently 
under arbitrary changes of the observable/. In the present case 
there are no such unknown coefficients, and the inequality 
| L Y ; | < Yi\ Yy| becomes an exact equality for non-negative Yj. 

This seems to leave the linear form 7.10 as the only viable op- 
tion here. 



There seems to be no obvious general argument to counter the 
appeal of simplicity of the linear form of the criterion, Eq.6.27, which 
also retains the most important degree of freedom represented by 
the parameter R. 

The linear form is compatible with the additive nature of small 
perturbative corrections and seems to conform well to the intuitive 
notion of which deformations of the parton event preserve the 
"number of jets" best. 

8.29 
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Multiple jet configurations 



9 



If jet algorithms are supposed to invert hadronization then 
one should also take into account that there may be more than 
one (perhaps a continuum of) partonic configurations that could 
hadronize into a given hadronic state (Sec. 4.70). The problem 
is the more severe, the more pQCD corrections are taken into 
account. It is clear that at some level of precision, this effect 
must be taken into account in the theory of jet algorithms. 

The multiplicity of parent partonic events for a given had- 
ronic event P is reflected in a multiplicity of allowed jet con- 
figurations. In the context of OJD this is manifest in the fact 
that any jet configuration Q that satisfies 5.19 is a valid candi- 
date, and the induced error can still be controlled via 5.24 and 
5.25. The global minimum is the best choice from the view- 
point of minimizing the overall error but there are at least two 
cases when a unique choice may be hard to make. One case is 
the potential occurrence of multiple global minima (Sec. 9.1). 
Another is when equality is reached in 5.19 (Sec. 9. 17). Both 
situations occur with theoretical probability zero but acquire 
significance in presence of detector errors. 

The phrase "a unique choice may be hard to make" means 
that there is a discontinuity in the mapping P— > Q . As any dis- 
continuity, this is manifest as an instability that causes a en- 
hancement of errors for events near the discontinuity (Sec. 2.47 
and [4]). Algorithmically, the handling of the problem of mul- 
tiple jet configurations is a special case of the general method 
of regularization (Sees. 2.51, 2.52). 

Two remarks: 

(i) The options discussed here go beyond the conventional 
data processing scheme 4.38. 

(ii) These options emerge naturally in the theory of OJD but 
they can be used in conjunction with conventional algorithms 
although in somewhat cruder forms because then one would 
not have the fine control of the weights offered by OJD 
(Sec.9.16). 



Multiple minima 



9.1 



Numerical experiments (sufficiently extensive to accept 
these conclusions bb ) show the following: 

(i) Multiplicity of local minima. Quite often (enough 
so that the issue may not be ignored, depending on the prob- 
lem), there is more than one local minimum for the expression 
6.27 as a function of Q (or, more precisely, z a f) for fixed P 
and R . The simplest example is an event consisting of exactly 
three particles with equal energies and arranged symmetrically. 
Then among all possible 2-jet configurations, there are three 
isolated global minima with the same value of £1[P, Q] . If one 
deforms the event slightly cc then the three minima remain local 
minima but in general only one will be the global minimum. 

The number of local minima is not large (0(1) on the aver- 
age). It seems to correlate positively with the number of hard 
partons in the underlying partonic event. 



We used several hundred events generated by Jetset/Pythia [30] for typi- 
cal processes studied at CERN and FNAL. Note that the mechanism of 
how Q. R organizes particles into jets is essentially insensitive to the under- 
lying physics. We have also used some simple events constructed manually 
to test the findings (iii) and (iv) in a more controlled manner. 

cc A deformation may involve: a deformation of any particle's parameters; 
splitting particles into slightly acollinear fragments: adding soft arbitrarily 
directed particles to the event. 



(ii) At the points of minima z a j are equal to 
either OR 1. In other words, particles tend to belong to a 
jet or the soft energy as a whole rather than are split between 
them. In this respect OJD is similar to the conventional algo- 
rithms. (However, first principles do allow solutions with frac- 
tional Z aj .) 

(iii) Minima z a j are localized at isolated points. 
This directly follows from (ii). 

The connection of multiple local minima with the multi- 
plicity of jet configurations as produced by conventional algo- 
rithms is discussed in Section 10. 

The occurrence of local minima in addition to the global one 
poses the following problem. No minimum search algorithm 
can absolutely guarantee that it has found the global minimum 
— especially for problems in 0(100) dimensions (recall that 
the dimensionality in our case is N panicles xN jets)- The best one 
can hope to achieve is reduce the probability of missing the 
true global minimum e.g. by repeating searches from random 
initial configurations. Numerical experiments show that, given 
the efficiency of the minimum search algorithm described in 
[7], an exhaustive search of all local minima with a high confi- 
dence level does not constitute a practical difficulty. 

The possible occurrence of several global minima poses the 
following problems. 

On the one hand, probability of production of events P for 
which £2[P,Q] as a function of Q has a degenerate global 
minimum, is zero. Indeed, there is a finite probability to pro- 
duce events with exactly N particles. In the subset of such 
events, the events for which £2[P, Q] as a function of Q has a 
degenerate global minimum is a set of measure zero because 
minima are localized at isolated points. The probability density 
is a continuous function, whence follows the proposition. 

Small deformations of such 
event (denote it as Pdi sc ) in general 
leave only one global minimum as 
shown in Fig. 9.2 where the curves 
describe trajectories of the min- 
ima, with solid parts correspond- r disc 
ing to global minima and dashed 
parts, to local minima. 

However, different deformations cause different global 
minima to survive. This means that with a non-zero probability 
detector errors may distort some events close to Pdi sc so that a 
local minimum will be seen as a global one. 

Consider an observable defined via intermediacy of the 
mapping P— > Q: 

P— -^Q— ^<p(Q[P]) = /(P). 9.3 

This differs from the conventional scheme 4.38 in that now we 
do not assume any cut to be applied to the events. Any such cut 
is assumed to be incorporated into (p as a 0-factor (cf. 2.40). 

Such an observable will in general be discontinuous near 
Pdi sc because the values <p(Qk ) where 0* are different global 
minima for Pdi sc , are in general all different. Then slight defor- 
mations of P would cause erratic jumps of /(P) between all 
(p(Qi< ), causing a non-optimal sensitivity of the observable to 
detector errors. One can suppress these fluctuations using the 
trick described below. 



Q i 



Q 2 [P] 

to- 

P 

9.2 
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The regularization trick 



9.4 



We are going to construct an observable / reg which would 
coincide with/ away from Pdi sc . But near Pdi sc , it would in 
general perform a continuous interpolation between different 
branches of /. 

Let Q k be the candidate local minima (they are actually 
functions of P; we will later discuss how Qk can be selected). 
Suppose one can find weights dd Wk normalized so that 



Then one would define 



9.5 



9.6 



If the weights Wk depend on P continuously then so does the 
expression 9.6, and the discontinuity of the mapping P — > Q is 
effectively masked. 

One should also ensure that the expression 9.6 coincides 
with 9.3 for P outside some neighborhood O of the point Pdi SC . 
For that, it is sufficient that the weights vary in such a way that 
only one of them remains non-zero outside O — the one which 
corresponds to the true global minimum Q k . Then outside O 
only one term in the sum 9.6 survives with a unit weight, and 
/reg(P) coincides with /(P) defined by 9.3. 

The weights Wk can be heuristically interpreted as prob- 
abilities that the event P resulted from hadronization of the 
partonic configuration Q k ■ See, however, the warning preced- 
ing 2.58. 

In terms of collections of events, the described mechanism 
amounts to a replacement of the initial collection of events P 
with a collection of weighted (pseudo) events Q: 



{ Pl } r {Q k ,W k } k 



9.7 



where the r.h.s. comprises all jet configurations for all P , . 
Then 

}Z w -» }Z / reg (p,) = }L k wqj . 9-8 



where 

in virtue of 9.5. 

A prescription for Wk based on linear splines 



9.9 



9.10 



Let us now present a simple prescription for constructing 
such weights. It should be remembered that there is no a priori 
recipe here (apart from the general desire to obtain a quasi- 
optimal observable; see Sec. 2.25). Also, it is not always neces- 
sary to eliminate all discontinuities: one may decide to patch 
some discontinuities and leave alone the rest (e.g. the kinds of 
discontinuities that occur seldom), depending on the problem. 
So the prescriptions described in what follows should be con- 
sidered as merely examples. 

Fix an event P and let Qo be the point of global minimum 
of the function ct)(Q) = £2[P, Q] with a = £2[P, Qo]. 

Choose the regularization parameter r > so that 



Recall that r should at least satisfy the restriction 2.59 where 
CTmeas should now be taken to be a typical error induced in £2o 
by detector errors in P. Eq. 9. 1 1 can be satisfied together with 
the mandatory restriction 2.59 provided 



fi) « <o cut 



9.12 



in the sense that Cmeas is small enough that the condition 
(Oo < Ocut is determined with a high reliability. The cases when 
9.1 1 cannot be satisfied will be considered separately 
(Sec. 9.17). 

Let Qk, k = l,... be all local minima of o(Q) (with the cor- 
responding Wk = C0(Qk)) which satisfy the restriction 

G) k < 6) Q + r . 9.13 
Compute the weights Wk, k = 0, 1 ,. . . , from the conditions: 

W k x r ~ 1 { r + 0) 0~ 0) k)- 9 - 14 

This together with the normalization 9.5 determines Wk ■ 

The described trick eliminates C-discontinuities due to de- 
generate global minima at least for events which satisfy 9.12. 
This is because the values C0k vary C-continuously with the 
event P 3 in general. However, this is not always the case, as 
discussed below. 



Cheshire local minima 



9.15 



Indeed, any event can be C- 
continuously deformed into any 
other event, and the number of local 

minima of Q. in general differs for different events. This means 
that some local minima disappear under small deformations. ee 
This may somewhat spoil the regularization effect of the pre- 
scription if one of the local minima happens to be such a 
Cheshire minimum and vanishes while the corresponding (Ok is 
non-zero. This is more likely for larger values of the regulari- 
zation parameter r because the regularization procedure would 
then see more local minima. 

It is possible to detect the Cheshire minima in the context of 
the minimum search algorithm described in [7] by looking at 
the values of gradient of SI . If these are smaller than some 
threshold then a corresponding factor should be introduced into 
9.14 to effect a suppression. Then the weight Wk would vanish 
in a continuous fashion as the corresponding Q k approaches 
the point where the local minimum disappears. 

In any event, it seems that the effect of Cheshire minima 
could be dangerous only if detector errors are large enough that 
there is a sufficiently sizeable fraction of events whose local 
minima could be regarded as candidate global minima (and 
only a fraction of that fraction would exhibit the effect). In such 
a case it cannot be excluded that one might have to abandon 
the scheme 9.3 altogether in favor of the more complicated C- 
continuous observables, e.g. those constructed along the lines 
of [4]. 



Comments for conventional jet algorithms 



9.16 



<o + r < (o c 



9.11 



It is possible to use the described scheme with conventional 
algorithms. In the context of, say, cone algorithms, may be 
candidate jet configurations obtained e.g. for different initial 
configurations of cones (or other variations of the algorithm). 
In this cases one would have to take all weights Wk equal. 



dd We adopt the convention that i labels events P and k labels jet configu- 
rations Q . So Wj and W t denote different arrays of numbers. 



ee Remember that there always is at least one global minimum for any 
event. 
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Then as P varies, the weights will no longer vary continuously 
but since in general only one weight jumps at a time, the dis- 
continuities would be mitigated. 

A better option is to evaluate Wk for each Q t using 
co k =n R [P,Q k ] with Q. R borrowed from OJD (Eq.6.27) even if 
are found using a conventional jet algorithm. This is possi- 
ble because all one needs is the corresponding recombination 
matrices. These are easily restored from the output of any con- 
ventional jet algorithm. 



Regularization by variations of R 



9.20 



Regularizing the cut £1 [P,Q] < co cat 



9.17 



In the situation we have just considered all candidate jet 
configurations have the same number of jets. Next we suppose 
that this is no longer the case. 

In the notations of 9.10, assume that r cannot be chosen 
small enough to satisfy 9.1 1 for whatever reason (e.g. because 
the condition 9.12 is not satisfied). It is assumed that 



9.18 



K is the minimal number of jets for which this condition is 
achieved. 

Define a to be a function of co Q that interpolates between 
the values 1 and at the ends of regularization interval, e.g.: 



a- 



1 if w a <a cal -\r, 

if ffl >ffl cut + {r, 

r~ 1 ( r + ffl cut - co ) otherwise . 



9.19 



Also define ji=l-a. 

Heuristically, a is interpreted as the probability that the 
event P has K jets, then /3 is the probability that the event has 
at least K+ 1 jets. 

For simplicity we assume that min Q £2[P,Q] on configura- 
tions with K + 1 jets does not exceed co cut —\r. Otherwise the 

construction is to be iterated in the same spirit. (This is not 
likely to be needed often because min Q £2[P,Q] as a function 
of K decreases rather fast.) 

Now, in the A'-jet sector, define and the corresponding 
weights Wk as in Sec. 9. 10 except that the sum of Wk is nor- 
malized to a rather than 1 . Perform a similar procedure in the 
(K + l)-jet sector with the only modification that the sum of 
weights is normalized to /5. Consider the collection of all jet 
configurations thus found together with their weights. The total 
sum of weights is equal to 1 by construction. The regularized / 
is obtained according to 9.6 where now the summation runs 
over jet configurations with different numbers of jets. 

If (p in 9.3 incorporates a jet-number cut then one may 
choose to drop from the r.h.s. of 9.7 the jet configurations 
which do not satisfy the cut. The weights Wk are evaluated 
prior to application of the cut, and the weights of the jet con- 
figurations retained in 9.7 are not affected thereby. The rela- 
tion 9.9 is no longer valid, and the value of the normalizing 
factor, N, has to be remembered separately. (This is similar to 
how luminosity may have to be measured via special independ- 
ent procedures rather than counting events for which a collision 
with a high transverse momentum occurred.) 



An interesting variation is to evaluate jet configurations for 
each event for a sequence R n of values of the jet radius pa- 
rameter R — e.g. a few values around the standard value R = 1 
(recall 8.17). For instance, i?i = 1-e, R 2 = 1, i? 3 = 1 + e with 
some e. 

This is motivated by the formal meaning of the parameter R 
(see Eq. 6.28 and the discussion around it) which may motivate 
one to perform an averaging over R . 



This option may be useful because events with clearly defined 
jets would tend to yield similar jet configurations for different values 
of R whereas more fuzzy events would yield different jet configura- 
tions for different n . 

So if one performs, say, histogramming of events in order to de- 
tect a peak, then the events which yielded several similar jet configu- 
rations would contribute in a more "focused" fashion. 

On the other hand, the events which otherwise may have been 
entirely eliminated by selection procedures now have a chance to 
contribute their share of signal albeit with a weight < 1 . 

9.21 



Let (On be the value of Q. Rn [P,Q opt ] , with Q opt found ac- 
cording to OJD with R=R„. Let a n = A(co cut - co n ) where 

A(x) is any monotonically increasing function, e.g. A(x)=x. 
(A function such as A (x) = x 2 would emphasize j et configura- 
tions which are farther from the cut.) Renormalize a„ so that 
their sum is equal to 1. The values thus found are larger for n 
for which the optimal jet configuration effects a better ap- 
proximation of the original event. 

Then for each n, find jet configurations together with the 
corresponding weights normalized in such a way that the sum 
of weights for each event is equal to a„ . The jet configurations 
for each n can be found in arbitrarily sophisticated fashion. In 
the simplest case one takes one jet configuration found ac- 
cording to OJD without regularizations, and sets its weight to 
be a„ . Alternatively, several jet configurations may be found 
using the regularization tricks described in Sees. 9.1 and 9.17. 

In any case one ends up with a collection of jet configura- 
tions and weights whose total sum is equal to 1 , and the regu- 
larized observables are found using 9.6. 

This option could be used with conventional algorithms if 
one takes the resulting jet configurations with equal weights or 
evaluates the weights as described in Sec. 9. 16. 



Discussion 



9.22 



(i) The described three regularization tricks regulate any ob- 
servable, irrespective of its specific shape and meaning. For 
instance, (p(Q) could be the (integer) number of dijets from Q 
whose mass belongs to some interval (bin) on the real axis. 
The corresponding regularized observable / reg takes non- 
integer values but its continuity is exactly what is needed to 
suppress fluctuations induced by detector errors. 

(ii) One may consider replacing the prescription 9.6 by the 
following one. Let the recombination matrices Za/ corre- 
sponding to the local minima Q;. Then define 

4r s) = E i=0 ,i34?- 923 

If Eq.9.5 holds, then this is a correct recombination matrix cor- 
responding to some jet configuration Q reg . One may be 
tempted to accept it as the resulting jet configuration. Since the 
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corresponding recombination matrix 9.23 has fractional matrix 
elements, Q rcg can vary continuously in response to deforma- 
tions of the original event. Then one would define the regular- 
ized observables <p Isg (Q) to be simply (p(Q Isg ). 

Unfortunately, such an interpretation is only valid if the 
condition 9.5 holds, and the important option of regularizing 
the cut involved in the definition of jets, Eq.5.19, must still 
follow the scheme of Eq.9.8. 

Furthermore, the regularization effect for discontinuous 
(p(Q) (cf. the example 4.42) is weaker here compared with the 
prescription 9.8. This is because the values p(Q reg ) still jump 
in response to variations in P, although less erratically thanks 
to the more stable Q reg as a function of P. 

(iii) The available experience seems to indicate that the val- 
ues of £2 at different local minima (if there are any) for the 
same event may exhibit a significant spread. This means that 
local minima with values close to the value at the global mini- 
mum occur rarely. 

(iv) The regularization tricks that yield a mixture of jet con- 
figurations with different number of jets may help to extract 
signal from events that would otherwise be dropped owing to 
the jet-number cut. For instance, suppose one looks at some 
process with 4 jets in the final state. Then events that would 
normally be counted as 3-jet events may, with regularization 
tricks, yield meaningful 4-jet configurations (with fractional 
weights < 1). And vice versa: events that would normally be 
counted as having 4 jets but with some pairs of jets close, 
would "spill" some of their content into the 3-jet sector. The 
net effect here is equivalent to a relaxation of the rigid con- 
ventional jet-number cuts. 

• All in all, the described regularization schemes are equiva- 
lent to a more sophisticated representation of the event's 
physical information — a representation in terms of several 
weighted jet configurations whose number may fluctuate de- 
pending on the event's fuzziness, etc. Such a representation 
preserves more information about the original event than any 
one jet configuration. This is especially true for the events with 
jets that are hard to resolve. The conventional jet-finding 
schemes correspond to enforcing a choice of one jet configura- 
tion even in situations when the choice is not clear-cut, and the 
jet configuration chosen may be a wrong one. On the other 
hand, with a regularized choice, the "correct" jet configuration 
will have chances to survive the jet-number cut, perhaps, with 
a fractional weight. 

Comparison with conventional algorithms 10 

The proposition 8.16 establishes that OJD is essentially a 
cone algorithm with an inclusive treatment of soft energy 
(Sec. 6.9) rewritten in terms of thrust-like shape observables 
(Sec. 8. 1 1). In this section we compare OJD with the conven- 
tional jet algorithms in a more systematic manner using the 
criterion of Sec. 5.26 for guidance. 

There are two widely used classes of jet algorithms devel- 
oped by trial and error: cone and recombination algorithms." 
We will consider them in turn (Sec. 10.1 and Sec. 10.9, respec- 
tively). 



Comparison with cone algorithms 10.1 



Cone algorithms were introduced in [8] and define jets in a 
purely geometric fashion using cones of a fixed shape and an- 
gular radius R, so that the finding of jets reduces to finding the 
number and positions of the corresponding cones. 

Cone positions are found via some kind of iterative proce- 
dure. We note that such an iterative search procedure can al- 
ways be interpreted as a search of a minimum of some implic- 
itly defined function on jet configurations; the function is para- 
metrized by the event. It is clear that in general such a function 
may have many local minima, similarly to what was observed 
for OJD in Section 9. 

The choice of initial configuration to start iterations from is 
not fixed by scientific considerations. Depending on how one 
makes this choice, one ends up with different jet configurations 
in the end. It is not difficult to realize that: 



The problem of choosing the initial configuration — which has as j 
a consequence non-uniqueness of the resulting jet configuration — 
represents a vicious circle in the definition of cone algorithms. To 
break it one needs an extraneous principle, which for conventional 
algorithms is usually replaced by a convention. i 

10.2 j 

In the case of OJD one simply opts for the global minimum 
of a well-defined shape observable, which corresponds to 
minimization of the information loss incurred in the transition 
from events to jets. It should be emphasized that the candidate 
jet configurations of the cone algorithms correspond to the lo- 
cal minima of OJD — not the degenerate global minima which 
occur much less often and to handle which our theory provides 
simple options (Section 9). 

The termination condition for the cone algorithm is usually 
ad hoc too. For instance, the algorithm may seek to make the 
cone axes coincide with the corresponding jets' 3-momenta 
[29]. 

The original proposition of [8] was to minimize the energy 
left outside all jet cones, which is similar to the mechanism of 
OJD (8.18; cf. Eq. 10.7). However, the algorithm of [8] is algo- 
rithmically inconvenient, so the currently used variations [2] 
abandon the theoretically preferred inclusive treatment of soft 
energy (Sec. 6.9) in favor of lower cuts on energies of candidate 
jets (the so-called 'f'-cuts). 

Note also how a scientific consideration is sacrificed here in 
favor of convenience of implementation of an ad hoc scheme, 
whereas with OJD, the theoretically preferred treatment of soft 
energy (Sec. 6.9) also leads to a simpler, faster and more robust 
computer code [7]. 

A murky problem specific to cone algorithms is how to treat 
cone overlaps. It remains essentially unsolved because of a lack 
of a guiding principle beyond the basic boundary condition 
4.33. For this reason one usually recurs here to ad hoc conven- 
tions. Eg 

The mechanism represented by the parameter R in OJD in- 
dicates its similarity to the conventional cone algorithms. The 
similarity is further exhibited by the algorithmic implementa- 
tion of OJD described in [7]. 

We conclude: 



See also sections 5.2.1-5.2.2 of [1] where they are called cluster and 
combination algorithms, respectively. 



Perhaps, after intense discussions in a working group ; ) 
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OJD is a cone algorithm in disguise with jet shapes determined 
(and jet overlaps handled) dynamically by means of a shape observ- 
able taking into account the distribution of energy in jets. 

10.3 

(See also 8.16 and 8.17.) 

One can obtain a less optimal (in the sense of Sec. 5.26) jet 
definition via a cruder estimate for £2, but such as would be 
closer to the cone algorithms. It is easy to obtain the following 
simple upper bound for the fuzziness of the j'-th jet: 

Y- [P,QJ < <EjRj , Rj = max 9 ■ , 1 0.4 

aej 

where the maximum is evaluated over all particles contributing 
to the jet, so that Rj is interpreted as the jet's radius. The re- 
sulting less optimal variant of the criterion would be 



Comparison with recombination algorithms 



10.9 



^=E; E ;(V*) 2+E S0f t - 



10.5 



The mechanism of minimization is more transparent here than 
in the non-simplified case: take a particle from one jet and 
move it to another or to soft energy. Then the criterion 10.5 
would decrease or increase depending on the induced changes 
in the two jets' radii Rj. 

An even cruder version is obtained via the following upper 
bound for Eq. 10.5: 



°* * (E ; - £ ;) max y(V*) 2 + E sof t - 



10.6 



So one could define a jet finding scheme similarly to OJD but 
based on the following shape observable which is the same as 
the r.h.s. of 10.6: 



or 



= (E t , 



-soft 



) max j(R j /R) 2 +E 



soft • 



10.7 



In this variant jets' radii ignore the details of the energy distri- 
bution between particles — as in the conventional cone algo- 
rithms. 

Verbally: the criterion 10.7 would attempt to include as 
much energy as possible into as few jets as possible with the 
jets' radii as narrow as possible but not exceeding R (as with 
OJD, a particle farther than R from any jet's axis is relegated 
to soft energy). If the event consists of non-overlapping sprays 
of particles with angular radii (measured from the spray's 3- 
momentum) not exceeding R, the criterion 10.7 will find jets 
in one-to-one correspondence with the sprays. 

In the absence of jet overlaps, the mechanism of the criterion 
1 0.7 is essentially equivalent to the original cone algorithm of Ster- 
man and Weinberg [8], and similarly to the latter, it handles the soft 
energy in a theoretically correct fully inclusive fashion. 

Unlike the algorithm of [8], the criterion 10.7 does not require ad- 
ditional prescriptions to handle jets' overlaps. 

10.8 

The criterion 10.7 may be implemented similarly to the 
simplex method [36]. Unfortunately, the analytical structure of 
10.5 and 10.7 does not seem to allow the tricks which contrib- 
uted to the efficiency of the implementation of OJD described 
in [7]. 

A general conclusion is that the conventional cone algo- 
rithms are non-optimal (in the sense of Sec. 5.26) whenever jet 
energies exhibit a significant variation. 



Recombination algorithms emerged in the context of Monte 
Carlo hadronization models (the Lucius algorithm [30]) with 
inversion of hadronization as a primary motivation, apparently. 
The recombination scheme was popularized by the JADE algo- 
rithm [31], and subsequently improved by the ^/Durham [32] 
and Geneva [33] variations. 



General discussion 



10.10 



A recombination algorithm iteratively replaces a pair of 
particles by one (pseudo) particle using some criterion to de- 
cide whether a given pair is to be recombined or left as is. 

There are three problems here — all similar to what one 
encounters with cone algorithms. 

One problem is the treatment of soft energy, and everything 
said about it in the context of cone algorithms is applicable 
here (the theoretically preferred inclusive treatment is aban- 
doned owing to a conflict with an ad hoc algorithmic scheme). 

Another problem is the lack of any firm principle to deter- 
mine the order of recombinations. Intuition suggests that clos- 
est neighbors should be recombined first but with 0(100) par- 
ticles in the event, there is still much choice. Similarly, one 
may start recombinations with the most energetic particles. 
This prescription is actually born out by the analogy with OJD: 
as is seen from the expression 6.20, starting to collect particles 
into jets from the most energetic ones allows one to focus from 
the very beginning on jet configurations in which largest con- 
tributions — those from the most energetic particles — are 
suppressed. 

However, selecting the most energetic particles is a non-IR 
safe procedure, so some preclustering is needed, which cannot 
be too coarse. This introduces an undesirable non-inclusivity 
which may result in an enhancement of power corrections. 

These ambiguities take the place of the problem of choosing 
initial conditions for the cone algorithms, so that the jet defini- 
tions based on recombination algorithms also contains a vi- 
cious circle similar to the one pointed out for cone algorithms 
(10.2). 

The third problem concerns the choice of the recombination 
criterion used to decide when two particles are to be recom- 
bined into one (this has a parallel in the problem of handling 
overlapping cones in the case of cone algorithms). There seems 
to be a consensus emerging about a preferential status of the kj 
criterion [32] as enabling better theoretical calculations. 



2 -» 1 recombinations 



10.11 



Let us now take a closer look at recombination criteria. 
To begin with, we note that a series of recombinations 



P' P . . . Q , 



10.12 



is naturally interpreted in the framework of the definition 5.9 
as a series of approximations 

/(P)-/(P')-/(P")-...-/(Q). 10.13 

so that each recombination can be analyzed within the frame- 
work of the developed theory. 

It is perfectly obvious that even if one performs each 2 -> 1 re- 
combination in an optimal way, the scheme 10.12-10.13 in general 
causes an accumulation of additional errors (instabilities) compared 
with a global optimization such as done in OJD. 

10.14 
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To appreciate this, recall how economically cancellations were 
arranged in our derivation of the corresponding error estimates; 
cf. e.g. Eq.6.5. 

Let us now obtain the 2— > 1 recombination criterion which 
corresponds to OJD. Consider how it treats a narrow spray 
consisting of two particles a and b . In this case one sees from 
6.20 that if one combines them into one jet j then 



Y°*~E a E b {E a + E b )- 9l b . 



10.15 



This is exactly the geometric mean of the JADE criterion [31], 

V JADE _ c- t? al 
Y j - t a t b Vab • 

and the Geneva criterion [33], 



^Geneva 



E a E b {E a + E h y 6 2 ab . 



10.16 



10.17 



(Remember that in our case all energies are fractions of the 
total energy of the event.) 

Furthermore, when recombining pairs of soft particles, the 
JADE criterion underestimates 10.15 and thus would tend to 
combine them into spurious jets — exactly the problem which 
gave rise to the Geneva [33] and fcx/Durham [32] variations. 
The Geneva and kj criteria (as well as the earlier Lucius crite- 
rion [30]) overestimate 10.15 in such cases. From the view- 
point of the developed theory, this is indicative of their non- 
optimality but is otherwise safe (overestimating induced errors 
is not dangerous). 

To further compare Eq. 10.15 with the kj criterion, rewrite 
the latter by normalizing to the total energy and taking square 
root to achieve first order homogeneity in energies: 



- E h j min(x, 1- 



'ab ' 



10.18 



where x = E a (E a +E h ) . Eq. 10.15 in similar notations be 
comes 

Y; pt «(£ a + ^)41-x)0; 



9 2 



10.19 




10.20 



The difference in the x -dependence is 
inessential (cf. Fig. 10.20; one can 
bound one function by the other times 
a coefficient) unlike the angular de- 
pendence which is qualitatively dif- 
ferent. A tentative conclusion from 
10.18 and 10.19 would be that the k T 
criterion would tend to yield more jets at smaller angular sepa- 
rations than the variant 10.19. It is thus less optimal than 10.19 
in the sense that in general it requires more jets to ensure that 
the same amount of information from the event is preserved in 
the resulting jet configuration. 

Of course, the latter property need not necessarily be a 
drawback because it ensures that the shape of the K-jet sectors 
in the space of events (4.35) is qualitatively different here 
compared with OJD, and this may be useful in practice (see the 
discussion in Sec. 5.1). 

• On the other hand, the advantage of the kj criterion over 
other conventional schemes (better theoretical predictions) 
seems to be overshadowed by an ultimate amenability to theo- 
retical analyses of the shape observables in terms of which 
OJD is formulated. 

Recall also the remarks in Sec. 5.31 concerning how dy- 
namical information could be incorporated into OJD. 



The last remark concerning the kj algorithm is as follows. It 
is an attempt to make use of the theoretical pQCD results to 
improve upon the recombination scheme and is obviously mo- 
tivated by the fact that the kinematics of 2 — > 1 recombinations 
makes irresistible an inclusion into the picture of theoretical 
results such as the Sudakov formfactor. However, this per se 
can hardly be regarded as a justification for the recombination 
scheme as such and does not correct its fundamental deficiency 
— the ambiguity of the order of recombinations (see Sec. 10.9). 

The point here is not that QCD should be ignored in the con- 
struction of jet algorithms but that the recombination scheme may 
not be the best receptacle to pour dynamical QCD wisdom into. 

10.21 

Non-uniqueness of jet configurations 

and the meaning of ©cut 1 0.22 

The above analysis indicates that the conventional algo- 
rithms behave as imperfect heuristics for the minimization 
problem in OJD. This observation reveals an interesting point, 
namely, existence of a source of errors entirely specific to con- 
ventional algorithms and uncorrelated for different algorithms. 

Indeed, consider possible existence of several local minima 
of Q.r (when the event does not appear to have well-defined 
jets at a given resolution ffl cut ). The optimal algorithm simply 
repeats the search from different initial configurations (e.g. 
randomly generated), and if it finds more than one local mini- 
mum then the global minimum is selected simply by compari- 
son of the corresponding values of Q. R . 

It is not difficult to realize that situations with several local 
minima seen by OJD have an immediate analog in the situa- 
tions where the conventional algorithms find different configu- 
rations depending on minor algorithmic variations such as the 
choice of initial configuration or the order of recombinations. 

The conventional algorithms, however, provide no criterion 
to select the best configuration from several such candidates: 
Any ad hoc prescription amounts to a more or less random 
choice — and from the viewpoint of OJD, a random choice of 
the local minimum results in a jet configuration which may in- 
herit less information from the initial event than is actually 
possible. In other words, the use of conventional algorithms 
implies a systematically larger loss of information compared 
with OJD. 

The instability of the found configuration of jets which re- 
sults from random choice of a local minimum is due to the sto- 
chastic nature of hadronization and is manifest on a per event 
basis. 

OJD would similarly fail in situations with several global 
minima but such situations occur with theoretical probability 
zero, and if they do become important due to detector errors, 
there are specific prescriptions to regularize the corresponding 
instabilities (Section 9). 

To summarize: 

(i) Ambiguities of conventional algorithms are an additional 
source of errors in physical results — additional compared 
with the theoretically optimal behavior of OJD. 

(ii) Then OJD is preferable over conventional schemes in 
proportion to how the number of events with more than one lo- 
cal minimum of £l R exceeds those with several global minima 
(taking into account various experimental and theoretical un- 
certainties). Events with several local minima seem to prolifer- 
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ate in proportion to importance of higher-order and power- 
suppressed corrections. 

(iii) It should be possible to quantify these effects by deter- 
mining the fraction of events with several local minima 
(checking along the way how occurrence of several local min- 
ima is reflected in ambiguities of results of conventional algo- 
rithms), and the fraction of events with several global minima 
(modulo various sources of uncertainties taken into account via 
[simple] models). 

(iv) We are also compelled to conclude that working with not 
too small values of the parameters such as tt) cut and y cut im- 
poses a fundamental limit on the potentially attainable preci- 
sion of interpreted physical information such as parameters of 
the Standard Model, obtained via intermediacy of jet algo- 
rithms, although the potential numerical magnitude of the ef- 
fect (or rather defect) remains unclear. This has a simple ex- 
planation: 



The parameter o cut and the similar parameters of the conven- 
tional algorithms actually describe the errors induced in the physi- 
cal information by the approximate description of events in 
terms of jets. 

10.23 



Cf. the estimate 5.18 from which OJD 5.20 is derived. 

• The conclusion 10.23 has to be taken into account when 
comparing results of different jet algorithms. This may help to 
explain the finding that the prospective dominant error in the 
planned top mass measurements at the LHC is due to the am- 
biguities of jet definition [34]. It may be possible to reduce 
such an error using the methods described in Section 9. 

• The conclusion 10.23 is also to be kept in view when com- 
paring results obtained using the same jet algorithm but differ- 
ent event samples (e.g. CDF and DO). 

Conclusions 11 

The discovered optimal jet definition (OJD; it is summa- 
rized in Sec. 7. 16) is essentially a cone algorithm (Sec. 8.14) 
entirely reformulated in terms of shape observables (the fuzzi- 
ness; Sec. 8.1) which generalize the well-known thrust to the 
case of any number of thrust semi-axes (Sec. 8.1 1). The cone 
shapes and positions are determined dynamically via minimi- 
zation of the fuzziness. The soft energy is treated inclusively 
via a cumulative cut on the soft energy (Sec. 6.9), which is 
similar to the original prescription of Sterman and Weinberg 
[8] but differs from the currently preferred 'f -cuts [2]. 

The criterion is controlled by two parameters: R and ffl cut . 
The parameter R sets an upper limit on the maximal angular 
radius of jets (Sec. 8.14). The parameter (O cut effectively sets an 
upper bound on the soft energy allowed to be left out of jet 
formation, but its primary role is to control the loss of infor- 
mation entailed by the transition from the event to jets 
(Sec. 5.17). 

The synthesis of OJD 11.1 

It is rather remarkable that OJD turns out to be a smooth 
blend of many things and tricks tried in the practice of jet algo- 
rithms. 

We have already noted that it is essentially a cone algorithm 
rewritten in terms of thrust-like shape observables. It even al- 
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lowed us to obtain a shape-observables-based analog of the 
original cone algorithm of [8] (Eq. 10.7). 

Neither is new the idea of jet finding via a global optimiza- 
tion — such a version of recombination algorithms was earlier 
explored in [35]. 

Curiously, OJD yields for each jet what we called the physi- 
cal 4-momentum qj with qj > (Eq. 7.1), and simultaneously 

a light-like 4-vector q- (Eqs.7.9 and 7.15), both closely re- 
lated and playing an important role in the definition; cf. 7.10. 
Note in this respect that the variants of cone algorithm used by 
DO and CDF yield, respectively, massive and massless jets [2], 
which can be associated with qj and % . q- . 

The options for inclusion of dynamical QCD information are 
also available although in a different form than with the kj al- 
gorithm — via dependence of Ct) cut on events (Sec. 5.31) and via 
theoretical analyses of the shape observables Q. R (Sec. 8.1). 

Even the recombination scheme — although it did not find a 
visible place in OJD — can still be regarded as a heuristic for 
minimum search (Sec. 10.11). 

The only important element of the conventional schemes not 
incorporated in OJD is the lower ('f -) cuts on jets' energies. 
Stray soft particles are now handled via an inclusive energy cut 
(8.18). 

What OJD derives from 1 1 .2 

A widely held opinion (cf. [2]) is that the definition of jets 
is subjective in nature. The developed theory shows that it is 
not quite so. 

The important ingredient which has been missing from the 
conventional discussions of jet definition (it would be mis- 
leading to use the word theory here) is the information analysis 
of the problem of jet definition. Our analysis is based on an 
earlier groundwork [4] which emphasized a purely kinematical 
viewpoint on jet algorithms as approximation tricks rooted in 
— but not identical with — the dynamics of QCD. 

The most important clarification of the theory of [4] ob- 
tained in the present paper is the notion of optimal observables 
for measurements of fundamental parameters (Sees. 2.7 and 
4.19). The notion (together with the resulting practical pre- 
scriptions, Sec. 2.25) provides a guidance for a systematic im- 
provement upon the conventional scheme of measurements 
based on the notion of jets (Sec. 4.28; cf. the new options de- 
scribed in Section 9). 

The notion of optimal observables allows one to interpret 
the event's information content (which is the basis of OJD, 
Section 5) in the light of the fundamental Rao-Cramer ine- 
quality of mathematical statistics (Sections 2, 4.19 and 5.26). 

The general considerations which went into the derivation 
of OJD are as follows: 

1 . A systematic reliance on first principles of physical 
measurement, quantum field theory and QCD. 

2. Avoidance of ad hoc choices not fortified by strictly 
analytical arguments. 

3. The requirement that the jet configuration must in- 
herit maximum information from the event. 

4. Conformance to the Snowmass conventions in re- 
gard of kinematics of hadron collisions. 

5. Maximal computational simplicity. 

A remarkable fact is that other properties usually postulated 
for jet algorithms emerge as mere consequences of the re- 
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quirement of computational simplicity, which fact provides 
their ultimate justification thus replacing the usual aesthetic 
arguments: 

6. Energy-momentum conservation in the formation of 
jets from particles (Sec. 6. 15, 7.3). 

7. Conformance to relativistic kinematics (Sec. 7. 7). 

8. Maximal inclusiveness of the criterion (needed to re- 
duce sensitivity to hadronization effects which corre- 
spond to higher order logarithmic and power correc- 
tions). 

Lastly — and most surprisingly — the found criterion pos- 
sesses a property which is naturally interpreted as 

9. An optimal inversion of hadronization (Sec. 5. 10). 

Mutable and immutable elements of OJD 1 1 .3 

One has to distinguish between the handling of a single 
event and the construction of observables for collections of 
events. 

At the level of an individual event, all the arbitrariness is in 
the form of £1. There is not much room left for modifications of 
the £1 as given by 6.27 u 7.10 U 6.8. The internal structure of 
Y and E so f( does not seem to allow meaningful modifications. 
So the only reasonable option might have been in how Y and 
E S oft are combined into a single quantity; it is represented by 
Eq. 8.26. It results in only marginal computational complica- 
tions but seems to make theoretical analyses more difficult 
without offering clear advantages (see the discussion in 
Sec. 8.25). 

At the level of collections of events things are more inter- 
esting. In particular, making ffl cut depend on the event P is the 
way to include dynamical QCD information into the picture 
(Sec. 5.17). However, lifting the restrictions of the standard 
scheme 4.38 (cf. Section 9) may be more important in the end 
(cf. the remarks after Eq.5.33). 

To summarize: the form of £l R (6.27 u 7.10 U 6.8) is the 
least mutable element of the described scheme, so that all 
variations would use as the main building block a minimization 
procedure for £1 (such a procedure is provided in the code de- 
veloped in [7]). 

The most interesting variations (i.e. those which allow im- 
provements of the conventional scheme 4.38 in the direction of 
constructing better approximations of the ideal optimal observ- 
ables 4.20) concern the definition of observables on collection 
of events (see Section 9 for tricks to start from). 

The simplest universal (i.e. dynamics-agnostic) optimal jet 
definition is based on the linear choice 6.27 (B = 1 in 8.26) and 
an event-independent © cut . This is closely related to the way 
the conventional cone algorithms are defined and may be ac- 
cepted, in the context of the developed theory, as a default 
definition for all comparisons. 

In short, the theory of OJD only deals with the function to 
be minimized in order to find jets (the fuzziness £1, 6.27 U 7.10 
U 6.8) — but it only provides guiding principles (the method 
of quasi-optimal observables; Sec. 2.25) for how observables 
are to be constructed. It is up to the user to decide whether to 
stick to the conventional scheme 4.38 or go beyond its limita- 
tions using e.g. the tricks of Section 9. 



Remarks on implementation 1 1 .4 

That OJD (summarized in Sec. 7. 16) is fully constructive is 
in itself rather wonderful given that it was derived in a 
straightforward fashion from the seemingly innocent (to the 
point of appearing meaningless) criterion 5.9 which, however, 
only accurately expresses a fundamental idea implicit in the jet 
paradigm — that the configuration of jets inherits the essential 
physical information of the corresponding event (5.8). 

Computationally, the problem of finding jets in our formu- 
lation reduces to finding the recombination matrix z a j which 
minimizes £l[P, Q] given by 6.27. For an event which lit up 
150 detector cells and contains 4 jets, z a j has 600 independent 
components, so that one has an optimization problem in a do- 
main of a very large dimensionality. Such problems are notori- 
ously difficult. Fortunately, the analytical simplicity of both the 
function to be minimized and the regularity of the domain in 
which the minimum is to be found (a direct product of standard 
simplices, one simplex per particle; cf. 6.3, 6.4) can be effec- 
tively employed to design an efficient algorithm [7]. 

Although the minimization algorithm of [7] was obtained 
from purely analytical considerations (a variant of the gradient 
search which makes a heavy use of the analytical specifics of 
the problem) plus some experimentation, a posteriori it is natu- 
rally interpreted as follows: 

— the algorithm starts with some (perhaps randomly gener- 
ated) distribution of particles between jets; 

— the jets perform iterative "negotiations" by considering par- 
ticles one by one and deciding if and how their energy should 
be redistributed between the jets and the soft energy in order to 
improve upon the current configuration; 

— the algorithm stops when no particle can be further redis- 
tributed to decrease £1. 

This is reminiscent of the iterative adjustment of jets' posi- 
tions in the cone algorithms. However, the jets' axes and 
shapes are specified in the conventional algorithms directly, 
and in the optimal criterion, indirectly via the recombination 
matrix. 

Feasibility of implementation of OJD is thus not an issue. 

• A liberating consequence of the jet definition via minimiza- 
tion of a simple function is that a specific implementation of 
the minimum-finding algorithm is of no consequence whatever 
(physical or other) provided it yields the optimal jet configura- 
tion with required precision. Thus different groups of physi- 
cists are free to explore their favorite algorithms — from sim- 
plest low-overhead methods for theoretical computations with a 
few partons, to neural, genetic, Danzig's [36], equidomoidal 
[37], . . . algorithms for experimental data processing — as long 
as they minimize the same criterion and control approximation 
errors sufficiently well in doing so. This would be a truly satis- 
factory way to resolve the difficulties encountered in compari- 
son of physical results from groups which use different variants 
of jet algorithms [2]. 

The criterion 6.27 tends to prefer configurations with z a j 
equal to exactly or 1 (remark (ii) at the beginning of 
Sec. 9.1). This makes the problem very similar to that of linear 
programming for which a vast theory exists (see e.g. [36]) 
where one can borrow ideas for more efficient or fancy imple- 
mentations. 

Note that allowing fractional values for z a j proves to be ex- 
tremely convenient algorithmically: the domain in which the 
minimum is to be found is then a convex region, so one 
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chooses an internal point (which corresponds to some frac- 
tional values) as a starting point for minimum search and then 
descends into a minimum via a most direct route. 

New options for jet-based data processing 

and ancillary results 1 1 .5 

The theory of OJD offers new options for improvements 
upon the conventional scheme 4.38. Some such options are de- 
scribed in Section 9. Also the additional information about 
events contained in the parameters Y; and E SO ft can be used to 
expand the phase space of jet configurations in order to en- 
hance informativeness of the resulting observables and thus 
approach the theoretical Rao-Cramer limit of the optimal ob- 
servables 4.20. 

Also the following results deserve to be mentioned here: 

(i) The usefulness of the method of quasi-optimal observ- 
ables (Sec. 2.25) goes beyond jet-related measurements. 

(ii) The Y-E so f, distribution (Sec. 8.19) offers a new model- 
independent window on the dynamics of hadronization thus 
allowing a new class of tests of pQCD as well as theoretical 
descriptions of hadronization models. 

Advantages of OJD 11.6 

OJD — even interpreted narrowly in the context of the con- 
ventional scheme 4.38 — has the following advantages over 
the conventional algorithms: 

(i) OJD solves the problem of non-uniqueness of jet configu- 
rations which is insurmountable in the context of conventional 
schemes. It thus eliminates a source of errors entirely due to 
the structure of jet algorithms (Sec. 10.22). 

(ii) OJD extirpates the difficulties of conventional algorithms 
usually "solved" via ad hoc prescriptions (the handling of cone 
overlaps, the choice of order of recombinations, etc.). 

(iii) The shape observables on which OJD is based generalize 
the well-known thrust and are therefore superbly amenable to 
theoretical studies — evidently more so than any imaginable 
modification of the conventional schemes (cf. the pQCD cal- 
culations for the thrust reviewed in [20]). 

(iv) OJD allows independent implementations so that differ- 
ent experimental and theoretical groups only have to agree 
upon the function to be minimized (Sec. 1 1.4). 



Furthermore, OJD offers new options for improving upon 
the conventional jet-based data-processing scheme 4.38 as de- 
scribed in Section 9 in the direction of approaching the theo- 
retical Rao-Cramer limit on precision of extracted fundamental 
parameters (see Sec. 2. 19). 

The simplest dynamics-agnostic OJD allows modifications 
to incorporate dynamical QCD information (Sec. 5.31). 

A fast and robust implementation of OJD is available as a 
Fortran code [7]. 

In conclusion, the most important result of this paper is a 
systematic analytical theory of jet definition based on first 
principles, with explicitly formulated assumptions, and with 
the logic of jet definition elucidated in a formulaic fashion. 
If one were to construct as precise as possible an approxima- 
tion to the optimal observable in a specific application then it 
is a theorem that OJD is a better tool for that than any conven- 
tional jet algorithm. However, it is not clear whether the cost 
of such an ideal solution would be justified by the resulting in- 
crease in precision of results. 

In any event, the developed systematic framework reveals 
some new options (e.g. the regularization via multiple jet con- 
figurations) which may be useful even within the conventional 
approach. 
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